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Preface 



Over the recent years a considerable amount of effort has been devoted, both in 
industry and academia, towards the design and development of Asynchronous 
Transfer Mode (ATM) networks both for high speed broadband public information 
highways and for local and wide area private networks. Performance prediction 
and reliability analysis of these networks are extremely important in view of their 
ever expanding usage and the multiplicity of their component parts together with 
the complexity of their functioning. 

Experimental ATM networks have now been established worldwide, based on 
commercially available ATM products and switch architectures. However, there is 
still a set of many interesting and important research problems to be addressed and 
resolved before a global integrated broadband network infrastructure can be 
established. This includes traffic modelling and characterisation, flow and 
congestion control, routing and optimisation, ATM switch architectures and 
internetworking, resource allocation and the provision of specified quality of 
service. 

This book brings together twenty-seven research papers from industry and 
academia reflecting latest original achievements in the theory and practice of 
performance modelling and analysis of ATM networks worldwide. These papers 
are selected, subject to peer review, from those submitted at a later stage as 
expanded and revised versions out of ninety-one shorter papers presented at the 
Fifth IFIP Workshop on "Performance Modelling and Evaluation of ATM 
Networks", 21-23rd July, 1997, Craiglands Hotel, Ilkley, West Yorkshire, England, 
UK. At least three referees drawn from the Scientific Committee and other peers 
were involved in the evaluation process of each paper. 

The research papers are classified into eight parts covering the following topics: 
Traffic Modelling and Characterisation, Routing, Switch and Multiplexer Models, 
Call Admission Control (CAC), Congestion Control, Resource Allocation, Quality 
of Service (QoS) and Tools and techniques. 

Part One on "Traffic Modelling and Characterisation" brings together seven 
papers and is concerned with modelling and analysis of multiplexed streams of 
bursty and correlated traffic. New mathematical models are formulated and 
validated against traffic traces: they are based upon Pareto self-similar streams, 
Markov Modulated Poisson Processes (MMPPs), non-stationary traffic processes 
inducing self-similarity and characterisations of threshold-based and leaky bucket 
measurement-based traffic profiles. 




Part Two on "Routing" includes two papers addressing important routing 
problems frequently encountered during the design and development of modem 
communication networks. In this context, balancing of virtual path (VP) and 
channel (VC) routing in ATM networks is suggested, based on the decomposition 
and aggregation of VPs and, moreover, a new hybrid (analytic/simulation) 
approach is proposed in order to determine two royting algorithms for optimal 
networks, based on the principle of deflexion. 

Part Three on "Switch and Multiplexer Models" consists of five papers which 
describe novel analytic methods for the analysis of queueing models of ATM 
switches and multiplexers. They include a study of ATM buffer systems, based on 
spectral analysis of rate matrices, an evaluation of a multiplexer with prioritised 
service using non-homogeneous aperiodic markov processes and the quatitative 
analysis of input queueing of an interconnection network with parallel iterative 
matching (PIM) scheduling between connected input and output ports. 
Furthermore, new analytic techniques are proposed for determining long-range 
traffic dependency and queueing behaviour of discrete-time on-off sources and also 
studying the correlation stmcture of the output process of an ATM multiplexer 
represented by a D-BMAP/D/l/N queue. 

Part Four on "Call Admission Control" incorporates three papers investigating 
performance aspects of CAC schemes supporting different QoS classes and carries 
out performance comparisons based on analytic methods, measurements and 
simulation. Various CAC strategies with guarantee grade of service are described 
in terms of call blocking, the concept of shape-function and call-level and cell-level 
dynamics. 

Part Five on "Congestion Control" includes a single paper which applies linear 
control theory towards the design of congestion control algorithms in order to 
achieve full utilisation of network links without incurring cell-loss in ATM 
networks. 

Part Six on "Resource Allocation" brings together seven papers addressing the 
fundamental problems of resource allocation in ATM networks. Novel and 
efficient bandwidth assignment algorithms are proposed based on the estimation of 
cell loss probability for real time traffic, equalisation of blocking probability in 
systems with limited availability, multiplexing gains due to an independent 
combination of identical sources, explicit rate switches with ABR traffic and 
newral networks-based prediction techniques. 
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Part Seven on "Quality of Service (QoS)" presents a single paper which analyses a 
new fair cell discarding algorithm on per- VC queueing basis via a probabilistic 
analysis and simulation under various traffic profiles and switch configurations. 

Finally, Part Eight on "Tools and Techniques" presents four papers consisting of: 
a fast algorithm for the computation of waiting time moments for a 
DBMAP/G/17N queue, a cost-effective methodology, based on the principle of 
maximum entropy, for the approximate analysis of arbitrary open queueing 
networks with finite capacity and Head-of-the-Line (HOL) service priorities under 
a partial buffer sharing scheme, a flexible PC-based ATM traffic generator and an 
efficient simulation-based environment for the performance evaluation of cellular 
networks. 

I would like to end this foreword by expressing my thanks to IFIP Technical 
Committee TC6 and Working Groups WG 6.3 on Performance of Communication 
Systems and WG 6.4 on High Performance Networking for sponsoring the Fifth 
Workshop on the Performance Modelling and Evaluation of ATM Networks (ATM 
'98), July 1997, Ilkley, UK. My thanks are also extended both to all supporting 
organisations and to the members of the Scientific Committee and external referees 
for their invaluable and timely reviews. 
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PART ONE 



Traffic Modelling and 
Characterisation 




A Model of Self-Similar Data Traffic 
Applied to Ethernet Traces 

JJ, Gordon^ 

CAPE Consulting Inc,, Rumson, NJ 07760, U,S,A. 

Tel +l-732’747-2479. Fax +1-732-741-1950, jjg@capecons.com 

F. Huebner-Szabo de Bucs^ 

AT&T Labs, Holmdel, NJ 07733, U.S.A. 

Tel +1-732-949-7639, Fax +1-732-949-1720, fhuebner@att.com 



Abstract 

This paper proposes a simple four-parameter model of self-similar data traffic. The 
basic element of the model is a single Pareto arrival stream, in which packet 
interarrival times (lATs) are iid Pareto distributed. The model proposes that two such 
streams be multiplexed or superimposed, to produce a composite stream in which the 
marginal lAT distribution is Pareto-like, and which exhibits power law correlation not 
only in block packet count (BPC) (the definition of self-similarity) but also in the lAT 
process. The presence of long range lAT correlations creates the potential for the more 
extreme queueing behavior associated with self-similar traffic. 

We provide a simple parameter estimation procedure for the proposed model which 
involves the mean and variance of the lATs, the Hurst parameter H, and the marginal 
lAT distribution. Based on analysis of Ethernet traces collected by Bellcore, the 
model is in good qualitative agreement with observed LAT distributions, and with the 
corresponding lAT and BPC autocorrelation functions. In simulated queueing studies, 
it reproduces the 'knee’ in the mean delay curve for one Ethernet trace. 

The proposed model exhibits the qualitative features present in self-similar Ethernet 
traffic: Pareto-like lAT marginals, and power law lAT and BPC correlations. It is 
conceptually simple, and easy to work with numerically (i.e., to simulate). It also 
provides physical insight into the nature of long range correlations, and the 
relationship between lAT and BPC correlations. 

Keywords 

Traffic Modeling, Self-Similarity, Pareto Distribution, Ethernet Traces 



* Part of the work for this paper was done while the author was with Bellcore, Red Bank, NJ., U.S.A. 




1 INTRODUCTION 



Aside from the packet length distribution, there are two factors that determine the 
queueing behavior of a packet stream entering a G/G/1 queue. The first is the 
marginal interarrival time (lAT) distribution, which is characterized by its mean and 
higher moments. The second is the correlation structure of the arrival process, which 
can be described directly by the I AT autocorrelation function (^) » or indirectly 

by the autocorrelation function pgpc (k)of the ‘block packet count’ (BPC) process. 
By definition, block packet counts Xk are the numbers of packets arriving during 
consecutive fixed length intervals or ‘blocks’ of time. The time series Xk is assumed 
here to be covariance stationary with autocorrelation function Pbpc (^) • The 
functions Piat(^) ^^d Pbpc (^) clearly related, though not in a readily 
expressible form. 

In classical queueing models based on e.g., phase-type distributions, the marginal lAT 
distribution is the main determinant of queueing behavior. In general, correlations are 
implicitly present in classical models. However, they decay exponentially fast and 
have only a second order impact on queueing. In contrast, long range or power law 
correlations are a fundamental property of self-similar traffic models. This type of 
correlation can in some cases have a more significant impact on queueing behavior 
than the variance and higher order moments of the LAT distribution. Given that self- 
similarity is present in data traffic from a variety of sources, self-similar traffic models 
are likely to play an important role in the design and cost-effective operation of future 
data networks. 

Fractional Brownian Motion (FBM) [5] is perhaps the simplest self-similar traffic 
model. FBM abandons the traditional framework of point arrival processes in favor of 
modeling the aggregate work process (i.e., the BPC process). It incorporates self- 
similarity through the Hurst parameter H. However, in FBM H is simply a parameter 
which is measured from data and plugged into the model. A more satisfactory model 
would derive the value of H from more physically intuitive properties of data traffic. 
In addition to this philosophical criticism, BPC-based models of data traffic encounter 
a significant practical problem. The measurement of H from BPC data appears to be 
a numerically unstable problem. To date, there are no reliable methods for extracting 
H from BPC data. Different BPC-based methods can produce markedly different H 
estimates [4]. Furthermore, even the same method can produce significantly different 
H estimates, depending on the range of data considered, etc. 

This paper proposes a self-similar traffic model based on multiplexed Pareto point 
arrival processes. This model has several advantages relative to FBM and other 
proposed models [6]-[8]. First, the proposed model appears to be in good agreement 
with real Ethernet data, based on analysis of the marginal lAT distribution and 
correlation structure. Second, the Pareto model provides insight into the nature of self- 
similarity. In the proposed model, long range BPC correlations are fundamentally due 
to power law tail of the marginal lAT distribution. When two Pareto traffic streams 
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are multiplexed, some of the correlation is ‘squeezed’ out of the marginal’s tail and 
into the lAT sequence itself. That is, the tail of the marginal lAT distribution becomes 
shorter, at the expense of introducing long range correlation into the multiplexed lAT 
sequence. This relationship between the lAT and BPC correlation structure may be a 
basic property of self-similar traffic. 

Finally, based on this understanding of the relationship between lAT and BPC 
correlation, the proposed model provides a method of measuring the Hurst parameter 
H from the more fundamental LAT autocorrelation function. This method of 
estimating H appears to be far more reliable than methods based on BPC analysis. 



2 MULTIPLEXED PARETO MODEL 

As noted above, there are two properties of an arrival process that determine its 
queueing behavior through a G/G/1 queue: its marginal LAT distribution and 
correlation structure. Classical queueing models are typically concerned only with the 
marginal LAT distribution, which is characterized by its first, second and perhaps 
higher order moments. In self-similar models, due to the impact of long range 
correlations, it is also necessary to model the correlation structure of the arrival 
process. In the simplest formulation, this correlation structure is modeled solely 
through the Hurst parameter H, which occurs as an exponent in the power law decay 
of BPC correlations. For self-similar traffic, the asymptotic behavior of pgpc (k) is 
pBPc (^) ~ ^ » where k is the correlation lag and 1/2 < H < I . 

A contention of this paper is that in order to satisfactorily model the correlation 
structure of an arrival process, it is necessary to use more than the single parameter H. 
Note that H determines the asymptotic rate of decay of BPC correlations. It says 
nothing about the absolute magnitude of these correlations. If Pbpc (^) were plotted 
on log-log axes, the asymptotic slope of the plot would be 2H - 2 . However, it seems 
intuitive reasonable that the intercept of this plot should also have an impact on 
queueing. Furthermore, the BPC correlation structure is only half of the picture. In 
general, an arrival process will also exhibit lAT correlations, and it is a priori unclear 
whether lAT and BPC correlations should be modeled independently, or in some 
combined fashion. 

The model proposed here models both lAT and BPC correlations in terms of two more 
fundamental parameters. This approach reflects the fact that lAT and BPC 
correlations are closely related, and that specifying one implicitly determines the 
other. The two parameters that model the correlation structure of the arrival process 
can be thought of as determining the slope and intercept of a log-log plot of Pgpc (^) • 
Alternatively, they can be thought of as determining the slope and intercept of a log- 
log plot of PiAT (/:) . Both of these views are valid, and either one provides a basis for 
estimating the model parameters from data. The proposed self-similar model is based 
on the Pareto distribution, whose probability density function (PDF) and 
complementary cumulative distribution function (CDF) are defined as follows. 
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( 1 ) 



fit) = Pr[T=t] = ap“/(P + r)“-^' 

F,(0 = Pr[r>f] = p“/(P + 0“ (2) 



Note that the A:th moment of the density (1) is finite only for ^ < a . In this paper we 
consider only cases for which a > 1 , so that a finite mean exists. Consider a single 
Pareto stream in which lAT s are iid distributed with density (1). In reference [1] it 
was shown that the Hurst parameter H associated with this process is 



H = 



(1 +a)/2 
(3 -a) /2 
< 1/2 



0<a< 1 
Ka< 2 
a>2 



(3) 



(Note that integer values of a require special treatment.) A single Pareto stream 
therefore provides a simple two-parameter model of self-similar traffic. A value of H 
in the range 1 / 2 < // < 1 maps into a unique value of a in the range 1 < a < 2 , and 
the parameter p would be chosen to match the mean arrival rate X: 
p=(a-l)/>.. 

As one might expect, this simple model is not sufficiently flexible to model real data. 
However, it does have two points in its favor. First, it suggests a ‘physical’ basis for 
self-similarity: long range correlations are a consequence of the power law tails of 
packet lAT distributions. Second, real packet data traces do indeed appear to have 
power law LAT distributions. Pareto-type processes are therefore attractive candidates 
for modeling self-similar data. The above points can be incorporated into a more 
sophisticated self-similar model. Consider a multiplexed stream of two Pareto 
processes, having parameters OCj , p j and tt 2 , P 2 • In the component Pareto streams, 
lATs are iid distributed with density (1). In the combined stream, lATs will in 
general be correlated (i.e., no longer iid), and the marginal LAT distribution can be 
shown to have the following Pareto-like form 



(P, +0“'"‘ (p2 + 0“^ 



(4) 



where Xj = (Oj — 1)/ P, and A, 2 = ((X 2 - 1)/ P 2 . The first term in eqn. (4) is 
obtained by considering arrivals in stream 1, and requiring that the lAT in stream 1 
and the residual lAT in stream 2 both exceed t . The second term is obtained by 
applying the same requirements to stream 2. In either stream, the residual lAT density 
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(r) is equal to (f) • (3/ (a - 1 ) , where F^ (0 is given by eqn. (2). (This is 
a standard result, discussed in [9].) Note that for large r, Fc(t) where 

a = + tt 2 - 1 . That is, the distribution (4) again has a power law tail, with the 

exponent being jointly determined by and tt 2 . 

Multiplexing two Pareto traffic streams leads to a shorter tail of the composite 
marginal distribution at the expense of introducing long range correlation into the 
multiplexed lAT sequence. This effect is shown in Figure 1 in which the 
complementary distribution functions F^(r) of two Pareto traffic streams with 
1 < a^. < 2 (dotted lines) and the corresponding Fdt) of the multiplexed stream 
(solid line) are depicted. It can be clearly seen that the tail of the superimposed traffic 
stream is considerably shorter than the tails of the two individual streams. We suspect 
that this relationship between the shortening of heavy-tailed distributions (by 
multiplexing them) and the introduction of long range lAT correlations may be a basic 
property of self-similar traffic. 




Figure 1 Tail behaviour of individual and multiplexed Pareto streams 



In general, one cannot explicitly determine the autocorrelation functions Piat(^) and 
pBPc(^) associated with the multiplexed stream. However, the following facts can 
be shown to hold. First, the Hurst parameter H in the multiplexed stream is equal to 
the larger of the Hurst g^ameters and H 2 associated with the component 
streams: Let X and Xj^ be the block packet counts in the component streams, and 
Pbpc (^) ^ad Pb^ (k) be the corresponding BPC autocorrelation functions. The 
BPC autocorrelation function in the multiplexed stream is 
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pBPc(^) "" 



cov[ ,xi‘^ ] 

var[Xi” +xfM 



cov[ ] +cov[Xi^\ ,XP ] 

var[Xi”] +var[X„‘^M 



= CO] ■ Pbk: (^) + ®2 ■ Pfirc (^) 



( 5 ) 



where 0), = var [Xn'M /var[Xn'^ +xi^^ ] and 

COj = var [X ] /var [ X i*’ + X ] . It follows from (5) that ^pc (k) will 

decay as the slower of (k) and p^^ (k) , so that H = max (//, .Hj) . 

Finally, the power law decay of Piat(^) can be related to H and Pbpc(^) 
follows. In [1] it was shown that in self-similar traffic, the mean arrival rate X(f) 
measured across a finite interval of time [0, t] converges to its long term mean value 

X{t) (6) 

Assume that Piat(^) is of the asymptotic form kT^. This assumption appears to be 
valid for real traffic (see section 3), and also for the model outlined above (see 
reference [2]). Based on results in [1], it follows that the mean lAT X (n) across a 
finite sample of n lATs converges to its long range mean value X = X~^ as n : 

x(n)~x (7) 

For consistency, the exponents in the power law convergence relations for X (t) and 
X(n) in equations (6) and (7) must be identical. We therefore conclude that H is 
related to y via the following simple equation 

H=1-y/4 (8) 

This equation expresses a basic relationship between the power law convergence 
behavior of Piat(^) and Pbpc(/^). As explained in section 3, it forms a basis for 
estimating the Hurst parameter H via analysis of lAT data. This method of H 
estimation appears to be far more reliable than alternative methods based on direct 
analysis of BPC data. Note that equation (8) assumes Piat(^) to be non-zero. This 
appears to be the case for real traffic. For single iid Pareto streams, in which 
Piat(^) = 0 for /: > 0 , equation (8) will still hold, but in a modified form. In that 
case, the infinite variance of the LAT distribution will ensure that X (n) converges to 
its long term mean value according to a power law. 
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3 PARAMETER ESTIMATION PROCEDURE 



The multiplexed Pareto model described in section 2 contains four parameters: a, , 
02 , P] , and P 2 . The proposed method for estimating these parameters is as follows: 

1. Plot the lAT autocorrelation function Piat(^) on log-log axes and 
measure its asymptotic slope y. The Hurst parameter H can then be 
calculated via equation (8). 

2. Without loss of generality, assume that a, < 02 and 1 < a, < 2 . (Note 

that 02 could be greater than two.) Based on equation (3), and the 
comments at the end of section 2, it follows that H = > H 2 . The 

value of o, is therefore given by: 

o, = 3 -2H = 1 + y/2 (9) 

3. Plot the lAT marginal distribution on log-log axes, and measure its 
asymptotic slope T) . This slope determines O 2 via the relation: 

O 2 = Ti-o, + 1 (10) 

4. Determine the ratio of the arrival rates X, = (o, -l)/p, and 
^ 2 = (O 2 - 1)/ p 2 in the component Pareto streams via the equation 
X,/X 2 = K . The parameters p, and P 2 can then be calculated in terms 
of the overall arrival rate X = X, + X 2 and the artificial parameter K : 

« a, - 1 

p2 = (1+K) 

5. Find the value of K which best fits the data. The recommended means of 
doing this is to select the value of k which matches the variance of the 
lAT distribution. In practice, this involves numerical iteration. 

The above estimation procedure requires three degrees of freedom to match the Hurst 
parameter H , and the mean and variance of the marginal lAT distribution. Note that 
H determines the slope of a log-log plot of Pbpc (k) or Piat(^) • Ideally, one would 
like to use the fourth degree of freedom to match the intercept of these plots. Then two 
parameters would be used to fit the arrival process’ lAT distribution, and two would 
be used to fit its correlation structure. However, directly matching the log-log 



( 11 ) 

( 12 ) 
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intercept of either pBPc(^) or Piat(^) appears to involve numerical difficulties, and 
is not recommended here. 

There are several criteria that could be used to determine K in step 5. For example, 
instead of matching the variance of the lAT distribution, one could instead select k to 
match the asymptotic intercept of a log-log plot of the lAT distribution. Whatever 
method is employed, the fitted model should ideally reproduce both the asymptotic 
slope and intercept of pBPc (^) and Piat(^) • That is, it should accurately model both 
the asymptotic magnitude and rate of decay of pBPc(^) and Piat(^) • The Pareto 
model may be judged successful if it accurately reproduces these quantities. 



4 PARAMETER ESTIMATION - MARGINAL lAT DISTRIBUTION 

The above model was applied to three Ethernet traces which were collected by 
Bellcore, and which are described in detail in [3]. These traces can be assumed to be 
representative of LANs supporting text processing, email and scientific programming 
applications for many users. Traces 1-3 consisted of 968631, 1359656, and 643454 
packet arrival events respectively, each representing one hour of recording time. Our 
analysis of these traces addresses the arrival process only. Packet length statistics are 
not considered. 




Figure 2 Interarrival time autocorrelation functions 
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As described in section 3, the first step in fitting the model to each trace is to plot the 
lAT autocorrelation function Piat(^) • Figure 2 shows plots of Piat(^) for traces 
1-3 on log-log axes. Note that for clarity of presentation, traces 2 and 3 are offset from 
trace 1 on the y-axis by factors of 0.1 and 0.01 respectively. Also shown in Figure 2 
are the linear approximations to Piat(^)* oach case, a least squares fit was 
obtained for lags k = 16 to = 1024 . The corresponding H estimates obtained 
from equation (8) were 0.92, 0.95, and 0.95 respectively. 

This method of H estimation appeared to be far more straightforward and reliable 
than methods based on BPC analysis. Consider, for example, the ‘log-variance’ 
method of H estimation. This method requires one to plot the variance of the 
aggregated block packet counts X = {Xkm - m + 1 + • • • + Xkm) versus m on 
log-log axes. In self-similar traffic, the variance of X decays asymptotically as 
m^H- 2 Consequently, one can in principle estimate H from the asymptotic slope of 
the log- variance plot. 

In practice, this method appears to significantly under-estimate H . Figure 3 shows 
log- variance plots for traces 1-3. The H estimates derived from these plots were 0.84, 
0.86 and 0.87. When substituted into the Pareto model and simulated, these estimates 
produce significantly better queueing behavior than those derived from Piat(^) • 
Since the Pi^T (k) estimates themselves appear to under-estimate the queue lengths 
obtained from the traces, we conclude that the log-variance method produces H 
estimates that are too conservative. 




Figure 3 Log- variance plots 
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Other BPC-based H estimates are more obviously unreliable. Figure 4 shows a plot 
of the periodogram of pBPc {k) for trace 1. For self-similar traffic, the periodogram 
should exhibit 1// noise. That is, it should diverge as as the frequency 

(0 ^ 0 . In principle, this provides another method of estimating H . However, the 
curve in Figure 4 is so variable or ‘noisy’ that no reliable estimate of H can be 
obtained. It appears that the BPC process is so volatile that it will not produce reliable 
H estimates over any realistic measurement interval. This problem affects all BPC- 
based methods of H estimation. For reasons that are not entirely clear, the problem 
does not affect lAT-based H estimation. Equation (8) therefore provides a more 
reliable basis for estimating H . 



I 







0.001 0.01 
frequency 



Figure 4 Block packet count periodogram 



According to equation (9), the H estimates derived from Figure 2 imply the following 
values for ttj : 1.16, 1.10 and 1.10. The next step in the estimation procedure is to 
calculate OL 2 from equation (10). Figure 5 shows plots of the marginal lAT 
distribution for each trace, plotted on log-log axes (solid lines). 
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Figure 5 Marginal interarrival time distributions 

Note that the function plotted in Figure 5 is the complementary distribution function 
- Pr[T> t\ . This function is more convenient to work with than the 
probability density function f{t) = Pr{T ^ t\ , since F^.{t) integrates out the 
measurement ‘noise’ in f{t) . Also note that for clarity of presentation, the marginals 
for traces 2 and 3 are offset from trace 1 by factors of 0.1 and 0.001 respectively. The 
slope ri of the marginals’ asymptotes is used to calculate (X 2 via equation (10). The 
0^2 estimates obtained in this way are 0.2 = 4.069, 5.961, and 4.397. 

Given that estimates for and QL 2 have been obtained, the final step in the 
parameter estimation procedure is to express Pj and P 2 in terms of K and as in 
equations (11) and (12), and find the value of K which matches the variance of the 
marginal lAT distribution. This involves numerical iteration on K. The parameter 
values derived from this procedure are given in Table 1, together with the 
corresponding values of the mean lAT T, the lAT variance G^, and H . The 
corresponding fitted marginal distributions, as given by equation (4), are 
superimposed on the actual distributions in Figure 5 (dotted lines). In all cases the 
Pareto model provides a good fit to the marginal lAT distribution. 
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Table 1 Parameters for the Fitted Pareto Model 



Parameter 


Trace 1 


H 


0.92 


X (ms) 


3.716 


(ms^) 


51.323 




1.16 


«2 


4.069 


Pi (ms) 


1.046 


P 2 (ms) 


26.420 



Trace 2 


Trace 3 


0.95 


0.95 


2.646 


5.594 


21.680 


63.202 


1.10 


1.10 


5.961 


4.397 


0.514 


3.108 


27.061 


23.174 



Several observations can be made regarding the fitted distributions. First, based on 
Figure 5 it appears that the fitted Pareto distributions are ‘flatter’ than the actual 
distributions i.e., less concave. A possible explanation for this is as follows. When a 
large number of Pareto processes are multiplexed, the resultant lAT distribution 
behaves like an exponential distribution for small r, while retaining a power law tail 
for large t. (The arrival process does not become Poisson, since long range 
correlations remain.) This type of distribution will have a more concave shape, as in 
Figure 5. It is possible that the Ethernet traffic is in practice generated by a large 
number of Pareto-like sources. Modeling this data in terms of just two Pareto sources 
is merely a useful mathematical approximation. 

Second, the fitted distributions extend to infinity, whereas the actual distributions are 
truncated at some finite value of t. Note that in all cases, the point at which the actual 
distribution is truncated (i.e., becomes zero) is well out into the tail of the distribution. 
Trace 1, for example, has a mean lAT of 0.0042 seconds, while its lAT distribution 
extends out to 0.4 seconds. It follows that the fitted Pareto distributions model the 
actual distributions over a very large range of t, certainly several orders of magnitude 
beyond the mean. 

Third, the plots in Figure 5 exhibit a jump or discontinuity at around t « 0.0042 
seconds. The origin of the jump is unclear, though it may be related to Ethernet packet 
segmentation or acknowledgment times. Regardless of its origin, this jump has 
negligible impact on queueing. 
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5 PARAMETER ESTIMATION - CORRELATION STRUCTURE 

Figure 6 shows the lAT autocorrelation functions (k) obtained from the Pareto 
model (solid line with symbols), superimposed on the actual autocorrelation functions 
derived from the Ethernet traces. Also shown in Figure 6 are the linear 
approximations to the trace autocorrelation functions (solid lines without symbols). 
Note that for clarity of presentation, the plots for traces 2 and 3 are offset from trace 
1 by factors of 0.1 and 0.01 respectively. 




Figure 6 Interarrival time autocorrelation functions 



Since Piat (k ) is difficult to calculate exactly for the Pareto model, the Pareto results 
in Figure 6 were obtained by simulation of the fitted Pareto model. For traces 1 and 2 
the model is in reasonable agreement with the data, while for trace 3 it under-estimates 
the lAT autocorrelations. Note in particular, that for traces 1 and 2 the slope of the 
predicted curve is close to that of the data, which suggests that the correct value of H 
was used. 

Figure 7 shows the lAT autocorrelation plots for trace 1 only. The fitted model is 
represented by lines with symbols, the actual data by a line, and the linear 
approximation to the data by the straight line without symbols. Also shown on Figure 
7 are the predicted lAT autocorrelation functions in cases where the value of K in 
equations (11) and (12) is either multiplied or divided by a factor of 10, relative to the 
fitted value Kq (triangles). The plot in Figure 7 shows that the Pareto model is quite 
sensitive to the value of K. 
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correlation correlation 





Figure 8 shows the corresponding plots of the BPC autocorrelation functions 
pBPc (^) * symbols represent results derived via simulation of the fitted Pareto model, 
while the lines without symbols represent the actual data. The model again fits traces 
1 and 2 reasonably well, while it under-estimates the BPC autocorrelations for trace 
3. The closest match to the actual BPC autocorrelation function appears to be obtained 
for trace 2, which is also the trace for which the most samples (i.e., arrival events) 
were collected. The accuracy of the fit in this case may reflect the fact that more data 
was available in trace 2. 



6 QUEUEING PERFORMANCE 

Figure 9 shows the mean time averaged queue lengths obtained when each of the three 
Ethernet traces is used to drive a G/D/1 simulation (dotted lines). These curves were 
obtained by running the simulation repeatedly, with the service time adjusted to give 
a specified utilization. Also shown in Figure 9 are the three queue length curves 
predicted by the fitted Pareto model (solid lines). The Pareto curves under-estimate 
the mean queue lengths for utilizations less than about 0.7. At higher utilizations, the 
Pareto curves rise more steeply than the actual trace driven curves. 




Figure 9 Mean queue length plots 

In general, the Pareto model does not closely match the trace driven queue lengths. 
Note, however, that the y-axis on Figure 9 extends over a very large range. Given that 
the traces each contain only about 10^ arrivals, one would not expect the curves to be 
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accurate in the upper ranges of Figure 9. The traces may contain too few events to 
accurately predict the probabilities of very large queue lengths. 



Another related reason for inaccuracy in Figure 9 may be that the traces are not 
sufficiently long to average out the large variability inherent in self-similar processes. 
Figure 10 shows five queue length curves obtained by using five equal parts of trace 
1 to drive separate simulations. The queue length results vary significantly within this 
set, even though they are part of the same trace. The unconnected symbols in Figure 
10 represent the overall average queue length for trace 1. 




Figure 10 Mean queue length plots for trace 1 



For the above reasons, it may be more relevant to check whether the Pareto model 
accurately predicts the ‘knee’ in the queue length curve. Figure 1 1 shows the same 
plots as in Figure 9, but on a more realistic scale. 
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Figure 1 1 Mean queue length plots 

This figure shows that the Pareto model fails to predict the knee in the queue length 
curves for traces 1 and 3 - it predicts the knee to occur at utilizations around 0.7 rather 
than 0.5. However, the Pareto model does correctly predict the knee for trace 2 to 
occur at a utilization of 0.55. 

In conclusion, the length of the Ethernet traces does not permit an accurate decision 
to be made regarding the Pareto model’s effectiveness at predicting queueing 
behavior. It appears that significantly longer traces are required in order to reach 
definite conclusions. However, the Pareto model does accurately predict the knee in 
the queue length curve for trace 2, and it is possible that the model is successful in this 
case because it accurately matches the BPC correlation structure in Figure 8. If this 
observation is correct, the proposed Pareto model has the potential to accurately 
predict queueing behavior if it is fitted to the data in such a way as to provide a closer 
match with Pepc (k). This issue is the subject of further research. 



7 CONCLUSIONS 

This paper proposes a simple four-parameter model of self-similar data traffic. The 
basic element of the model is a single Pareto arrival stream, in which packet 
interarrival times are //d Pareto distributed. The model proposes that two such streams 
be multiplexed or superimposed, to produce a composite stream in which the marginal 
lAT distribution is Pareto-like, and which exhibits power law lAT and BPC 
correlations. Based on analysis of three Ethernet traces collected by Bellcore, the 
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proposed model is in good qualitative agreement with observed marginal lAT 
distributions, and with the corresponding lAT and BPC autocorrelation functions. 

In simulated queueing studies, the proposed model accurately reproduced the ‘knee’ 
in the mean delay curve for one Ethernet trace. It did not accurately predict the 
queueing behavior of the other two traces. However, this may have been due to 
shortcomings of the proposed parameter estimation method, rather than shortcomings 
of the model itself. Further research into parameter estimation techniques, and / or 
analysis of longer Ethernet traces, is necessary in order to reach definite conclusions 
regarding the proposed model’s effectiveness at predicting queueing behavior. 

In summary, the proposed model exhibits the qualitative features present in self- 
similar Ethernet traffic: Pareto-like LAT marginals, and power law LAT and BPC 
correlations. The model is conceptually simple, and easy to work with numerically 
(i.e., to simulate). It provides physical insight into the nature of long range 
correlations, and the relationship between lAT and BPC correlations. Although not 
emphasized in the body of the paper, the model can fit non-self-similar as well as self- 
similar arrival processes. In the case of non-self-similar arrivals, the fitted Pareto 
streams will have larger a values (i.e., Hurst parameter is equal to 0.5), and the power 
law tail of the lAT distribution will decay more rapidly, so it more closely resembles 
exponential decay. A multiplexed Pareto model therefore has the potential to be a 
unified model of both self-similar and non-self-similar traffic. 

The model proposed here can be generalized in several directions. For example, one 
could multiplex more than two streams (see [2] for a discussion of this idea). 
Refinement of the Pareto model is a subject of ongoing research. 
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Abstract 

We present an analysis of Asynchronous Transfer Mode (ATM) traffic 
data collected from the London MAN (Metropolitan Area Network). The 
traffic is found to have different characteristics depending on its intensity. 
The correlations between inter-arrival times of sessions for traffic from 
heavily utilised sites is insignificant at the 5% significance level and a 
Poisson Point process is thus appropriate as a model. Traffic from low 
utilisation sites, on the other hand, shows long-memory characteristics 
with correlation at lag 40 and more. This traffic is modelled firstly using 
an autoregressive process and then the well known Markov Modulated 
Poisson Process (MMPP). The burst size distribution for both correlated 
and uncorrelated traffic shows distinct peaks which corresponds to the 
fragmentation as data travels through networks with different Maximum 
Transport Units (MTUs). 



1 Introduction 

ATM technology is being standardised as the networking technology for many 
of the emerging wide area Broadband Integrated Service Digital Networks (B- 
ISDN) protocol stacks. It is thus important to have a good understanding of 
the behaviour of the traffic and its load so that any analytical models that are 
built to predict performance take a realistic workload model as input. Unre- 
alistic workload models can easily lead to either very pessimistic or optimistic 
predictions [12]. 

Some work that has been done on Ethernet traces e.g. [15], [20] and [26] has 
cast doubt on the use of Poisson arrival processes as models of the traffic. We 
have set out to study whether this too is the case for ATM networks and how we 

*The authors would like to thank Jonathan Couzens of the London MAN for his help in 
the monitoring and capture of data used in this research. 
imb3, pgh@ic.ac.uk 




may model the traffic. We have collected and analysed ATM traffic at cell level 
and present models for it based on traffic and the workings of network protocols 
through which the traffic travels. The paper starts with a description of the 
experimental setup and the capture procedure. In section 2, the traffic data is 
analysed and the two main categories that the traces fall into are identified. In 
section 3 we present an analysis of the traffic data based on methods employed 
in time series analysis and show how traffic from sites that have heavy traffic 
differs from light traffic. Section 4 considers models for correlated traffic using 
techniques from time series analysis. The autocorrelation function is then used 
to parameterise a Markov Modulated Poisson Process (MMPP). In section 5 
we consider the burst-size distribution of cells and present models for it. The 
paper concludes in Section 6. 

2 Measuring IP traffic over ATM 

There are very few ATM networks that are sufficiently utilised to provide rep- 
resentative traces. However, the network from which our measurements are 
taken is the London MAN which has high utilisation due to Telehouse which is 
the hub for the UK’s commercial Internet traffic and is one of the nodes on the 
network. The data obtained from this network is live and not under controlled 
experimental conditions. 

Figure 1 illustrates the network configuration for the MAN and an enlarged 
view of the Imperial College node where traffic was captured. The switch is a 
Fore Systems’ ASX-200 ATM switch; the connection with the MAN is a 155 
Mbs~^ OC-3c multi- mode fibre link. The router is a Cisco router with a Cisco 
API ATM adapter card interfacing with the switch. The adapter layer embeds 
a dedicated processor and special-purpose hardware to handle AAL5. The 
physical layers of the LANs in the sites on the London MAN consist entirely of 
Ethernet. 

2.1 Measurement Tool 

The data is collected using a Hewlett Packard Broadband Series Test System 
at the point indicated on Figure 1. Incoming data is captured at full line rates 
up to 155 Mb/s. The switch is configured so as to make a copy of each cell and 
send it to a spare port to which the analyser is attached. 

The analyser has SMbytes of RAM, enough to capture 131,072 ATM cells 
(including resassembly overhead) in one session. The system can capture in both 
continuous (write to buffer in circular fashion) and burst mode (stop writing 
to the buffer when it is full). We used the burst mode mainly and were able to 
capture cells for upto 30 minutes for sites with low-utilisation and up to 1 minute 
on sites with heavy utilisation. All inter- arrival times can be measured to an 
accuracy of 10 nanoseconds on this system. The captured cells are illustrated 
in Figure 2. The low-order bit of the payload type field of the ATM cell is used 
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Figure 1: The Experimental Site 



Unique path and Payload type; 

channel identifiers indicates trailer; 




Figure 2: An example captured header cell 
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to mark the end of the final cell in a packet. If this bit is 1 or 3 the cell is the 
last in the packet. 2 or 0 is used to indicate congestion and 4, 5, 6 or 7 is used 
for resource reservation. 



2.2 Experimental Procedure 

We have conducted many captures per switch port, each data collection having 
been done at different times and on different days so as to limit the effect of 
statistical variations. Table 1 gives the times at which each capture started 
and ended to the nearest second. The intensity column of the table gives the 
levels of intensity X/Y where X is the intensity when the capture was started 
and Y is the intensity when the capture stopped. There are 3 intensity levels; 
Low(L), High(H) and Moderate(M). The data was captured on weekdays only 
as we noticed a marked decrease in usage patterns on Saturday and Sunday. 
By so doing, we have also eliminated a potential source of seasonality in our 
data. 

The data is fed through a series of filters which classify each cell according 
to the IP protocol: TCP, UDP or ICMP. The filters are written in Perl 5 and 
based on pattern matching. Perl was chosen as the language to write the filters 
because it is optimised for pattern matching and it is easy to pipe data in 
parallel to as many files as desired. 

We match on the different parts of a cell and pipe the desired statistics 
e.g. the time-stamp to an appropriate file. First the AAL header is matched 
with the standard IP header to identify the start of an IP packet, then the 
IP protocol is identified and finally pattern matching is done on the type of 
protocol (see Fig. 2). The largest proportion of traffic was TCP (80%), UDP 
accounted for 10% and ICMP was less than 0.01% of the traffic. The main 
constituents of the TCP traffic were FTP, Telnet, SMTP and Web (or HTTP). 
As the proportion of traffic from ICMP, other TCP and UDP was so small, it 
was collectively classified under “miscellaneous” . 

The inter-arrival times between cells are obtained in the usual way, by taking 
the difference between timestamps of successive cells. 

The filtered data clearly showed :- 

• The inter-arrival time between cells that belonged to the same packet is 
of the order of 8/xs - 20/xs and 10ms - Is between successive packets. 

• The minimum number of cells within a packet is 2 and the maximum is 
32. 

This indicates that because inter-arrival times of packets > inter-arrival times 
of cells within a packet, we must analyse the two separately. Hence we analyse 
our data as arrival points of batches and define a hatch as an arrival composed 
of 2 or more cells. Our analysis concentrates on analysis and modelling of 
inter-arrival times between cells and the burst-size distribution at each arrival 
instant. 



25 




S.no 


Date 


Start 


End 


Data set 


Intensity 


1 


04/02/97 


11:54:14 


11:54:35 


t9702041154 


H/H 


2 


04/02/97 


14:20:13 


14:20:15 


19702041420 


H/H 


3 


06/02/97 


12:33:43 


13:03:25 


19702061233 


H/M 


4 


06/02/97 


14:39:46 


15:50:05 


19702061439 


H/H 


5 


06/02/97 


18:23:59 


18:24:21 


19702061823 


M/M 


6 


06/02/97 


19:31:57 


19:32:09 


19702061931 


L/L 


7 


07/02/97 


14:50:39 


14:50:55 


19702071450 


H/H 


8 


07/02/97 


15:55:10 


15:55:12 


19702071555 


H/H 


9 


07/02/97 


17:06:53 


17:07:10 


19702071706 


M/M 


10 


10/02/97 


11:51:04 


11:51:23 


19702101151 


H/H 


11 


10/02/97 


16:59:32 


16:59:45 


19702101659 


H/H 


12 


11/02/97 


07:18:59 


07:19:35 


19702110718 


L/L 


13 


11/02/97 


14:55:24 


14:55:32 


19702111455 


H/H 


14 


12/02/97 


09:47:13 


09:47:37 


19702120947 


L/L 


15 


12/02/97 


14:53:54 


14:54:03 


19702121453 


M/M 


16 


13/02/97 


10:09:09 


10:09:39 


19702131009 


H/H 


17 


18/02/97 


11:27:03 


11:33:48 


19703181127 


H/H 


18 


18/02/97 


16:12:32 


16:18:50 


19703181612 


H/H 


19 


18/02/97 


18:20:52 


18:28:02 


19703181820 


M/M 


20 


19/03/97 


13:08:05 


13:14:55 


19703191308 


H/H 


21 


24/03/97 


10:16:49 


10:17:10 


19703241016 


H/H 


22 


24/03/97 


13:45:26 


13:45:38 


19703241345 


H/H 


23 


24/03/97 


14:51:17 


14:51:31 


19703241451 


M/M 


24 


24/03/97 


15:58:39 


15:58:49 


19703241558 


H/H 


25 


02/04/97 


14:54:07 


14:54:21 


19704021454 


L/H 


26 


02/04/97 


15:56:28 


15:56:46 


19704021556 


H/H 


27 


02/04/97 


17:19:27 


17:19:44 


19704021719 


H/H 


28 


02/04/97 


18:32:51 


18:33:07 


19704021832 


M/M 


29 


03/04/97 


08:33:42 


08:34:06 


19704030832 


L/L 


30 


03/04/97 


11:08:19 


11:08:38 


19704031108 


H/M 


31 


03/04/97 


12:14:25 


12:14:44 


19704031214 


H/H 


32 


03/04/97 


14:12:55 


15:24:40 


19704031412 


H/M 


33 


03/04/97 


16:09:37 


19:00:09 


19704031609 


H/N 


34 


04/04/97 


12:56:29 


14:27:05 


19704041256 


M/H 



Table 1: Traffic log 



3 Analysis of inter-arrival times 

We feed the data through a filter that generates inter-arrival times of batches 
and perform identical data analysis on each of the sets in Table 1. The results 
can be divided into 2 categories; those for heavy traffic channels and those for 
light traffic channels. We have focused on 2 sets: t9702041154, representative 
of heavy traffic, and t9702061439, representative of light traffic. A full set of 
results from each data set in Table 1 can be found in [2] 

To test for correlation, we compute the auto-correlation function of the inter- 
arrival times given by equation 1 for lag A; = 1 ... A" where N is the number of 
data points in the traffic log. 

The auto-correlation coefficient pk at lag k is given by :- 
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( 1 ) 



_ ~ ^){xt+k - x) 

Elliot -sr 

where xt is the inter-arrival time at time t and x is the mean inter- arrival time. 

In the first instance we plot correlograms (a plot in which pk is plotted 
against k) for each protocol type and data set, Figures 3, 4, 7 and 8. The ap- 
proximate 95% confidence limits at ±2/y/N are used to test the null hypothesis 
that the data is random. We can assume that the time series are stationary 
because :- 

• The correlograms decay to zero reasonably quickly. 

• From our analysis of the 34 data sets in Table 1, we noticed that the correl- 
ograms of data sets with the same intensity, have the same characteristics 
regardless of the day or time the data was collected. We thus proceed 
showing typical traffic behaviour from the sites with heavy utilisation and 
that from sites with low utilisation. 

3.1 Typical correlograms from sites with heavy utilisation 

Plots from high utilisation sites (Figures 3 and 4), show that less than 5% of 
the values of pk lie outside the 95% confidence limits and those that do, are at 
apparently arbitrary lags. We can thus conclude (see [3], [4]) that there is no 
firm evidence to reject the hypothesis that the observations are independently 
distributed. 

Inorder to check whether the inter-arrival times of hatches of cells can be 
modelled with an exponential distribution, the data is binned into histograms 
and the exponential function 

f{x) = 

fitted using the least squares method. The results for aggregate traffic from 
data set t9702041154 are shown in Figure 5. Figure 5 A shows the fitting of an 
exponential function to the histogram of inter-arrival data by taking the original 
definition of a batch as in section 2.2, Fig. 5B shows a fit on considering cells 
arriving within 0-0. 000 Is as being part of a batch and Fig. 5C shows the best fit 
when cells arriving within 0-0. 0002s are considered as part of a batch. The plots 
suggest that the tail of the distribution in Fig. 5 A is exponential. We postulate 
that reason the first bins show divergence from exponential behaviour is due to 
a second level of burstiness at the IP (or packet) level. 

In order to investigate this, we pass the data through another filter that uses 
a second definition of a batch. This time we consider all adjacent transmissions 
with the same source-destination pair (see Fig. 2) as a batch arrival. When we 
now plot the histogram of inter-arrival times and do a non-linear least squares 
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Figure 6: TCP batches removed from inter-arrival times set : T9702041154 



fit, we find that we have a good fit (Figure 6). This indicates that the inter- 
arrival times between TCP/IP sessions are exponential and that a Poisson point 
process is a good model for these arrivals. To test the goodness of fit, we perform 
the test. The number of degrees of freedom (i/) is 258. The statistic is 
302.09677; for v > 100, \/2x^ is approximately normally distributed with mean 
\/2v — 1 and unit variance. Thus the critical value Xo. 05 , 25 s = 303.9003. Since 
302.09677 < 303.9003, we do not reject the hypothesis that an exponential 
distribution is an appropriate model. 

3.2 Typical correlograms from sites with light utilisation 

Correlograms from the low utilisation sites (Figures 7 and 8) show different 
behaviour. Inter-burst arrival times for traces of aggregate traffic and the web 
protocol from set Tt9702061439 display heavy correlation at long lags, indicat- 
ing long memory behaviour. From our analysis of data sets with L/L intensity 
(table 1), we find that the correlogram of inter-burst times of aggregate traffic 
always displays long-memory characteristics. The major cause of this behaviour 
are the long sessions in Web traffic, as evident in Figure 7. In fact, up to a lag 
of 80, there is very little difference in correlogram from Web traffic and that 
from the aggregate traffic. 

We propose to model the long-memory behaviour using time-series model- 
s. The choice of which type of process to fit is notoriously difficult [3], [13]. 
However, recent studies in [24] show that Autoregressive (AR) models can be 
used effectively as models for data with long-memory or long-range dependence 
and are in fact almost as effective as the powerful FARMA models which are a 
generalisation of autoregressive integrated moving average models in which the 
degree of differencing is allowed to take fractional values. Fitting AR processes 
to the data entails finding out the order and parameters of the process. 

However, AR models can only be used to simulate a time series. In order 
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Figure 7: Autocorrelation function of inter-packet times. Ref : Tt9702061439 
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to create work load models for queueing network- type models, we aim to show 
how a stochastic process, the well-known MMPP process (section 4) that can 
capture the correlation structure of the inter- arrival times. We also show how 
the MMPP can be parameterised using just the auto-correlation function. 



4 Models for correlated traffic 

4.1 A time series model 

An autoregressive process of order p, with mean /i, is given by 

Xt — IjL — Oil {Xt-i — /jl) ap{Xt-p — jjj) Zt 

where Zt is a Normal random variable with zero mean. The parameters p and 
cti , . . . , o;p may be estimated by substituting the sample autocorrelation coeffi- 
cients pi .. . pp into the first p Yule- Walker equations and solving for (di, ...dip) 
(for further details see [4]). In matrix form these equations are :- 

Rd = r 

where 





1 


Pi 


P2 


Pp— 1 










Pi \ 




pi 


1 


P2 


... Pp-2 






&2 




P2 




P2 


P3 


1 


... Pp-3 




d = 




r = 


















Oip—i 




Pp-l 




Pp-l 


Pp-2 


Pp-3 


1 


/ 


\ dp / 




\ Pp J 



The order p is estimated using the partial autocorrelation function and the 
Akaike’s Information Criterion (AIC) [1]. 

4.1.1 The Partial Autocorrelation Function 

When fitting an AR(p) model, the last coefficient, a{p) measures the excess 
correlation at lag p not captured by an AR(p-l) model. The partial autocorre- 
lation function (pcf) is a function of the order, p and the last coefficient, a{p). 
By fitting AR processes of successively higher order m and taking the estimate 
for dyn, to be approximately normally distributed, we can test the null hypoth- 
esis that am = 0 and hence that the order of the AR process is m — 1. It can 
be shown that dim lies in the band ±2/y/N with probability 95% if am = 0, 
assuming a Normal distribution [23]. 

The pcfs of the aggregate and web data from data set t9702061439 are shown 
in Figure 9. It can be seen that in this instance, the order is not obvious, and 
we thus use the AIC. 
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Lag (k) Lag (k) 

Figure 9: Partial autocorrelation function of inter-packet times. Ref : T- 
t9702061439 

4.1.2 The Akaike’s Information Criterion (AIC) 

This is a very general criterion based on information theoretic concepts. It uses 



the quantity: 




AlC{q) = nlog(j2-f2g where 


(2) 


cr^ = <^(0) + o;i<^(l) -f . . . + 


(3) 



(f){k) is the estimated covariance at lag k. If we plot AlC(q) against q, the graph 
in general will show a clear minimum and the appropriate order of the model is 
determined by the value of q at which AlC(q) attains its minimum value. The 
book by Priestley [22] as well as the original paper by Akaike [1] give full details 
on the workings of the AIC. In this case, the plot shows a minimum value at 
40 which is therefore taken to be the order of the AR model. 

4.2 The MMPP model 

Though we have characterised the inter-arrival time distribution of ATM cells as 
a stochastic model, a time series model is inappropriate in analytical queueing 
network models. Hence we seek to show a link between the autocorrelation 
function of inter-arrival times from the sampled traffic and the well known 
Markov Modulated Poisson Process (MMPP). We then use a MMPP, which 
has been parameterised using the autocorrelation function, to model correlated 
ATM traffic. 

The MMPP is a doubly stochastic Poisson process and was first used in 
queueing theory by Naor and Yelachi [17] and Neuts [18]. Poisson arrivals are 
generated by a source with rate governed by an m-state irreducible continuous- 
time Markov Chain (CTMC) which is independent of the instantaneous arrival 
process. 
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While the underlying CTMC is in state or phase z, let arrivals occur ac- 
cording to a Poisson process with rate A^. The MMPP is characterised by the 
following parameters 

1. The generator matrix of the underlying modulating CTMC, Q. 





! -^1 


Q12 


qi3 


... X 




Q21 


-Q2 


Q23 


Q = 












\ Qml 






~Qm J 


where qi = 




\ 


and Qij 


is the 



( 4 ) 



i to j. In the sequel, the transition rate matrix Q is assumed to be time- 
homogeneous, i.e. the process we are modelling is stationary and therefore 
Q does not vary with time. 



2. The Poisson arrival rate at each phase is given by the diagonal matrix. 



/ Ai 
0 



V 0 



0 ... 0 \ 

A2 . . . 0 

0 

0 ... Xm J 



( 5 ) 



4.2.1 Joint inter- arrival time distribution 

In order to find the inter-arrival time distribution of an MMPP, we use the 
sequence {{Jk,Xk), k >0} where Xk is the time between the {k — l)th and kth 
arrivals and Jk is the state of the CTMC at the A:th arrival instant. See also [8]. 
The resulting Markov renewal sequence has joint state-transition/inter-arrival 
time probability distribution matrix given by 

F{x) = / 

Jo 

= - Q)]g A 

= {I-e<«-'^)“^(A-Q)-iA 

Each of the elements of Fij{x) are the conditional probabilities, P{Jk = i, Xk < 
x\Jk-i = i} ^OT k > 1. The corresponding probability density function is 

" ^F(x) = {e-(^-«)»^(A-g)}(A-Q)-iA 

= g(<5-A)xA 

The joint density of Xi, . . .X^, with the state jn (n > 1), conditioned on 
the state jo, is the n-fold convolution of the probability density function matrix. 
Its joint Laplace transform is therefore: 

n 

r(si,...,s„) = J](s*I-Q + A)-iA 

k=l 
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4.2.2 The autocorrelation function of the inter- arrival time distri- 
bution 



We determine the conditional moments of the time between arrivals by taking 
partial derivatives of the Laplace transform of the joint probability density 
function matrix for a sequence of n successive arrival instants. We then de- 
condition over the initial states (assuming stationarity) and sum over the final 
states. We can then obtain an MMPP in terms of the auto covariance function 
of the underlying time series. 

Let 7T represent the stationary vector of Q (eq. 4) of the CTMC that sat- 
isfies ttQ = 0 and tti = 1 and let e = (1, • • - , 1)^- Let the equilibrium 
probability that an arbitrary arrival finds the CTMC in state i be tt*. Then, 



TTj Aj 

E m \ 

j=l 



i.e. the state probability vector on arrival is :- 



7tA 

tt.A 



Thus 



E[Xl] = 7T* / ! [(A - Ae, / > 1 



In particular, the first moment of Xk is given by :- 

= 7r*[(A-Q)-iA]*^-^A-g)-2Ae, l<fc<n (6) 

The expected value of the product XiXk-^-i is given similarly by taking partial 
derivatives of /*(si, . . . s^+i) w.r.t. si and Sk-\-i at 5i = S 2 . . . = sa;+i = 0. 

E[X^Xk+i] = 7r*(A-Q)-2A[(A-Q)-iA]*=-i(A-Q)-2Ae 

The autocovariance at lag k, E[{Xi—E[Xi]){Xk-\-i —E[Xk-\-i])] = E[XiXk-\-i] — 
E[Xi]E[Xk^i], is therefore, 

(Pk = 7r*(A-g)-'^A[{I-e7r*(A-g)-iA}{(A-g)-iA}'^-i](A-Q)-2Ae, k > 1 



The autocovariance at lag 0 i.e. the variance of inter-arrival times is given by :- 

00 = E[X^]-E[Xi]^ 

= 27r*(A-g)-3Ae-(7r*(A-Q)-2Ae)^ 



The autocorrelation function for a MMPP, pk = 4>kl<Po, for fc > 1 is thus 



7t*(A - Q)-2A[{I - en*{A - Q)-^A}{(A - Q)-iA}*-i](A - Q)~^Ae 
27t*(A - Q)-3Ae - (7 t*(A - Q)-‘^Aef 



( 7 ) 
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4.2.3 Mean of the Inter-arrival times (lAT) 

The first moment (mean) of the lAT is (from Eq. 6) 
E[Xi] = 7T*(A-Q)-2Ae 

7rA(A-g)-i(A-g)-^Ae 

tt.A 

_ 7t(I + g(A - g)~^)(I + (A - q)~^Q)e 
7tA 



as tt.Q = 0 and Q.e = 0, as expected. 

4.3 Parameterising a MMPP 

We present a method to parameterise a MMPP by matching the first moment 
and the observed autocorrelations against those of the MMPP given by eq. 7 
in section 4.2.2. If (n — 1) covariances are to be taken into account, we would 
need at least n equations to parameterise the MMPP in this way, which would 
then have order of at least Cei\[^/ri]. Eqn 8 and Eqn. 7 are used to match the 
mean and autocorrelations respectively. 

The parameters are found numerically by solving a set of non-linear equa- 
tions; we use Broyden’s method — a multidimensional secant method. 

To validate our approach, we first estimate the parameters of a MMPP from 
the mean I AT and autocorrelation function of a known MMPP, with generator 
matrix {qij} and rate-matrix {A^}. Preliminary results, shown in Table 2 indi- 
cate that the method does not converge to a unique solution for different initial 
values to the numerical method. The autocorrelations obtained from each solu- 
tion, as expected, are very close to the autocorrelations of the original MMPP. 
This is not entirely surprising since, although a given stochastic process has a 
unique covariance structure, the converse is not true. It is usually possible to 
find many normal and non-normal processes with the same acf. Jenkins and 
Watts [13] (p. 170) give an example of two different stochastic processes which 
have the same acf. Since our objective is to match autocorrelations, any such 
process is acceptable. Other criteria might prefer one of the possible candidates, 
but we do not consider this further here. The algorithm suffers from the limi- 
tations of Broyden’s method, that the initial values must be close to the roots 
of the equation for non-linear systems. More importantly, if the Jacobian used 
in the calculation of the next estimates of the roots of the equations becomes 
singular or nearly singular, then the difference between the current estimate of 
the root and the next estimate cannot be determined. 

4.4 Known Autoregressive models 

When it is known that traffic of a certain type or from a particular channel has 
a certain stationary AR model (after subtracting out the mean), its autocor- 
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Parameter 


True solution 


Set 1 


Set 2 




-0.5 


-8.660272 


-0.454088 


Q1,2 


0.5 


8.660272 


0.454088 


Q2,1 


0.2 


0.240278 


0.198401 


Q2,2 


-0.2 


-0.240278 


-0.198401 


Ai 


0.9 


8.009269 


0.856416 


A2 


0.1 


0.115861 


0.097955 



Table 2: Parameters for the MMPP 



Lag 


Autocovariance 


Autocovariance 


Auto covariance 




True Solution 


Set 1 


Set 2 


Mean 


3.04348 


3.03997 


3.04341 


1 


0.063202247191 


0.0552495292 


0.063195764 


2 


0.017775632022 


0.0132967083 


0.017773341687 


3 


0.004999396506 


0.0032000716 


0.00499862097 


4 


0.0014060802674 


0.000770149 


0.0014058252 


5 


0.000395460075 


0.000185342 


0.00039537794 


6 


0.000111223146 


0.000044602 


0.00011119712 


7 


0.0000312815098 


0.0000107351 


0.00003127336 


8 


0.0000008797924 


0.000000258 


0.000000879540399 


9 


0.0000002474416 


0.0000000621803 


0.000000247364 


10 


6.959295 * 10~7 


1.496472 * 10" 7 


6.9569362950 * 10" 7 



Table 3: The acf for the first 10 lags 
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relation function can be obtained analytically and the method of the previous 
subsection applied to obtain the corresponding MMPP directly. (An autore- 
gressive process, Xt = aiXt~\ + a 2 Xt -2 OLkXt-k + Zit is stationary if 

and only if the roots of the equation 1 — ol\B — a 2 B‘^ . . . - = 0 lie outside 

the unit circle.) 

Consider now an AR(2) process 



Xt — aiXt-i + a2Xt-2 + Zt 
Multiplying by Xt-k and taking expectations, we obtain 



(t>k — + OL2(f>k-2 + E[Xt-kZt] 

for the covariance between Xt and Xt~k> Now, E[Xt~kZt] = 0, when A: > 0 
since Xt-k can only involve shocks Zt up to time t — k (see [3]). Thus 

(j)k = ak-i(f)k^i +o;20fc-2 for /j > 0. (9) 



Similarly, 



00 = 0^1 01 + OL2^2 + and for A; = 1, 2 we have 

01 = ai0o+«20i 

02 = o:i0i-ha20o 

Hence, 



00 = 



01 == 



4(1 -^2) 

1 — Q !2 ~ 4 “ ^1 “ 4^2 — 0^2 

g|ai 

1 — a^ — Q^ — — aj«2 — oli 



( 10 ) 

( 11 ) 



4.4.1 An example 

Consider the following second-order stationary AR(2) process: 

We calculate the autocorrelation function of the AR(2) model using the equa- 
tions in section 4.4 and then use our numerical routines to obtain an MMPP 
that has the same autocorrelation function for the first 4 lags (recall that the 
mean has been subtracted out). The parameters of the best-fit MMPP turn 

( -0.015004 0.015004 \ 

V 0.108 -0.108 ) 

( 10.0 0 \ 

\ 0 0.4 y 



out to be 
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Figure 10: A comparision of the autocorrelation functions 



We then plot the first 10 autocorrelations of both the AR process and the 
autocorrelations from a simulation of the derived MMPP model. Results are 
shown in fig. 10. 

The AIC for the time series resulting from the MMPP was also calculated 
and is plotted in Fig. 11. Clearly the order is 2 indicating that the order of the 
original time series has been preserved. 




Figure 11: AIC from analytical autocorrelation function 



4.5 MMPP for traffic data 

To fit an MMPP against real, monitored traffic data, we applied the method 
to the e-mail traffic from observed data set Tt9702061439. We first estimate 
the autocorrelation function of the time series comprising the observed arrival 



40 






instants, {Ai\l <i <n]. At lag A:(0 < < n). It is estimated as 




where (j)k = 



1 

n — k — 1 



n—k — l 

Y, Xi+kXi 

i=l 



where — A (1 < i < n — 1), A^ = Ai+i — Ai and A = E[Ai\. 

From these estimates, we match a 2-phase MMPP as in the previous sections, 
using the mean, /xi, pi, p 2 and p 3 to determine the parameters of the MMPP 
and use autocorrelations at higher lags for validation. The MMPP that we 
estimate is 



Q = 



- 1.1 1.1 \ 
0.7 -0.7 ) 



A = 



9.0 0 \ 

0 2.0 ) 



Mean for the traffic data : 0.213761; Mean for MMPP model : 0.211765. 



5 The burst size distribution 

The fit in Fig. 6 led us to the conclusion that there is burstiness at 2 distinct 
levels; at the primary level, due to batch arrivals of cells within a packet with 
inter-arrival time of order 200 - 1000 ns and at a secondary level due to the 
effect of TCP and is of order 2000 - 7000 ns. 



5.1 Cell level 

To examine the burst size distribution at the ATM cell level, we plotted his- 
tograms for the numbers of cell arrivals within a packet. The distribution for 
heavy traffic is shown in Figure 12 and that for light traffic sites in Fig. 13. 

It is evident from the histograms that both heavy utilisation channels and 
low-utilisation channels have a similar burst size distribution. This can be ex- 
plained by the type of traffic that is flowing in the network. Most of the traffic 
is TCP/IP traffic that has originated from an Ethernet LAN. The MAN has 
been configured with AAL5 which is the data transport layer of ATM and is 
used to send IP datagrams. When sending these, IP does not fragment to the 
ATM cell size [5], [6], [9] (48 octets -h 5 octets header information) itself: instead, 
it uses a Maximum Transport Unit (MTU) of 9180 octets and allows AAL5 to 
fragment the datagram to cells. At the LAN level, Ethernet imposes a MTU 
of 1518 octets and a minimum of 64 octets. From this it is evident that the 
minimum number of cells that will be sent is Ceil[64/48] = 2 and the maximum 
in one batch = Ceil[1518/48] == 32. In the histograms showing the frequency 
of burst-size distributions, three bin sizes, 2, 12 and 32 were found to occur 
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invariably in histograms for aggregate traffic, Web traffic and FTP traffic. 2 
is by far the most frequent due to the acknowlegement packets that are in- 
volved in TCP/IP transfers. Web and FTP packets tend to be large and are 
the largest constituents of the traffic. There is thus likely to be fragmentation 
in the transportation of their packets. The peak at bin size 12 corresponds to 
the IP specification that says that a router must always handle datagrams of 
up to 576 octets. Ceil[576/48] = 12. 

5.2 TCP Level 

The geometric distribution has been used widely to model burstiness in network 
traffic [10], [21], [14]. We investigate whether it is suitable for modelling the 
TCP batch size distribution from our traffic data. We fitted an exponential 
function, see Figure 14. A geometric random variable has probability 

mass function pi = p(\ — pY~^ for i > 1 and some positive p < 1. Comparing 
with the fitted exponential function, we have 

= (12) 

To show the fit clearly, we have drawn the histograms using a line plot rather 
than a normal box plot by joining the mid-points of each bar of the histogram. 
To test the goodness of fit, we performed the ^ test on the fits and obtained 
the results shown in Table 4. Thus at the 97% significance level, we conclude 
that the geometric distribution is not suitable for modelling TCP batch size 
distributions in Figs. 14A - 14C, but seems to be suitable in the case of 

Fig. 14D. The reason that Figs. 14A, 14B and 14C show such a bad fit is the 
small values in the tails of the fitted exponential function. However, the test 
on 14D does show that the geometric distribution may be a good model for 
ATM e-mail batch sizes. The use of generalised exponential distribution (GE) 
would then model an e-mail arrival process in heavy traffic well. However a 
more general model would have to model more closely the MTUs of different 
networks and thus the fragmentation that occurs as data travels through them. 

6 Conclusions 

The results presented in this paper show that ATM traffic shows very different 
behaviour depending on its intensity. This has a direct bearing on modelling 
input traffic in analytical models. 
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Figure 14: Burst size distribution Set : t9702041154 







We have shown that heavy traffic on the MAN can be accurately modelled 
by a Poisson point process with 2 levels of burstiness. We have shown how 
the sizes of the bursts are a consequence of fragmentation within the networks 
and from the results in section 5.2, it is evident that a geometric distribution 
can be a good model for the burst size distribution in some cases. Though 
we suspect that a deterministic model based on the workings of TCP may be 
better, such models are not mathematically tractable in analytical modelling 
and queueing network models in particular. Moreover, experience suggests 
that many model outputs are sensitive to only mean burst size and that the 
assumption of geometrically distributed burst sizes is robust. 

Light traffic exhibits long-memory or long range correlation. We have pro- 
posed a method to model it with a MMPP parameterised using the autocorre- 
lations of the time series obtained from our data. The MMPP, of course, has 
the added advantage of being analytically tractable. The MMPP has been sug- 
gested as a model for ATM traffic, see [11], [25], [19] and a fitting method based 
on the EM algorithm [7] has been used to parameterise it in [16]. However, 
our approach is novel as we use correlation rather than the burstiness in the 
traffic to parameterise our MMPP model. It is our hope that this coupled with 
spectral expansion techniques will lead to building blocks for queueing network 
models for bursty, correlated arrivals. 

In time series analysis, periodic effects frequently emerge strongly in long 
time spans and it is a common method to use self-similarity in the study of such 
time series. It is therefore likely that in previous publications where mention of 
self-similarity and fractal-like behaviour has been made, a simple explanation 
of the observed behaviour may be that of periodicity which could be due to the 
TCP/IP protocols or usage patterns. Also many of the traffic measurements 
used in e.g. [20] and [26] were done prior to the advent of the web and therefore 
exclude the massive increase in the volume of traffic due to it. 

Finally, we have also proposed a new method to parameterise an MMPP. 
Based on our preliminary investigations the results look encouraging. A general 
problem is that it is not at all clear that an arbitrary autocovariance function 
can be realised by some MMPP. There is little that can be done about this but a 
possible approach to overcoming the it is the following. An observed time series 
could first be modelled by an ARMA process using established techniques for 
which software already exists, e.g. SPlus. This step is intended to characterise 
the process generally, rather than represent one particular sample which is prone 
to producing significant estimation errors. If successful, the autocorrelation 
function of the ARMA process would provide a more reliable representation of 
the real traffic and the method of section 4.4 could be used. 
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Abstract 

This contribution concerns the statistical characteriza- 
tion of the traffic involved by a Local Area Network with an 
objective of flow control on Asynchronous Transfer Mode 
networks. Many authors investigate the traffic measured 
on broadband networks on long time scales and they all 
conclude that the measured traffic is self-similar. Our ap- 
proach is rather different. We suspect that the apparent 
self-similarity of the traffic is due to the presence of non 
stationarities in a traffic with no self-similarity. To support 
our intuition we test the stationarity of the measured traffic 
on different time scales ranging from a few seconds to hours 
of traffic. We then propose to model the measured traffic as 
a locally stationary and semi-Markov processes. We study 
into details the estimation of the parameters of semi-Markov 
processes; both block and recursive procedures of estima- 
tion are exposed. We eventually generate three different 
non stationary and semi-Markov processes whose parame- 
ters are matching the estimates of the varying parameters of 
the measured traffic. We prove that if one relied on classical 
visual indexes of self-similarity one would conclude that the 
synthetic traffic is self-similar. These findings question the 
consensus about the possible self-similarity of the traffic and 
they lead the way to some realistic and tractable models. 

Keywords: Ethernet, ATM, long range dependence, 
self-similarity, non stationarity, local stationarity, test for 
stationarity, hidden Markov model, Markov modulated 
Poisson process, block estimation, recursive estimation, 
variance-time analysis, R/S analysis. 




1 Introduction 



A large number of contributions have been devoted to the statistical 
analysis of the traffic measured on broadband networks after the 
pioneering paper by Willinger et a/.[l], who presented for the very 
first time the analysis of high quality LAN traffic data, recorded at 
Bellcore. Soon afterwards, a large number of data became available 
(not only for LAN, but also WAN, VBR video traffic, and so on...). 

It is interesting to note that most of the contributors used more 
or less the same statistical investigation methodology and came up 
with more or less the same conclusions. The authors affirm that 
the measured traffic is heavy tailed and long-range dependent that 
is to say R{k) = 0.5 < H < 1.0, R{k) denoting the au- 

tocorrelation of lag k of the series under study (inter-arrival times 
or block packet counts) and L{x) being a slowly varying function 
at infinity i.e. Wx > 0^ L{rx)l L{r) 1. These findings are at 

variance with classical models of traffic such as the Poisson process 
or the Markov modulated Poisson process [2] for which dimension- 
ing results have been established. It is known that quality measures 
such as the overflow probability or the delay are strongly underesti- 
mated by these traditional models if the real traffic has long-range 
dependence. 

It is often difficult to And a physical interpretation for self- 
similarity. An explanation of the phenomenon that is significant 
for teletraffic applications has been proposed by Taqqu [3]; Taqqu 
proves that the aggregation of several ON/OFF renewal sources 
with Pareto distributed ON and/or OFF periods conducts asymp- 
totically to a self-similar traffic. Other classical models displaying 
long-range dependence are the fractional Gaussian noise and the 
fractional (p, d, q) ARIMA process. More recently semi-Markovian 
models with an underlying Markov process on a countable infinite 
instead of finite space, enabling a richer probabilistic behavior for 
the underlying process have been proposed by Robert et a/. [4] [5]. 

It is interesting to note that long-range dependence only spec- 
ifles the autocovariance structure of the process for large lags, or 
alternatively, the behavior of the spectral density function at zero 
frequency. This is mainly because most of the works on long-range 
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dependent processes have been devoted to Gaussian processes and 
more recently to linear processes: for these processes, second-order 
structure in some sense specifies all the dependence structure of 
the process. Very few is known on non-linear long-range dependent 
processes, at the exception of fractional bilinear and ARCH models. 

As stressed above it is somewhat surprising that most of the in- 
vestigators have adopted the methodology prescribed by Willinger 
et al\\] without trying other possible routes. The most question- 
able part of their analyses is undoubtedly the fact that the traffic is 
supposed to be second-order stationary on these long time scales. 
Most of the analyses are done on hours of traffic. Based on these 
large time scale data, the usual method consists in ’proving’ long- 
range dependence by computing several easily visualisable tests, at 
the very first place the variance-time plot, despite the fact that this 
index is known to be a poor estimator of the Hurst parameter from a 
statistical point of view. Heavy-tailness is proved by displaying the 
histogram of the data or by computing a goodness-of-fit measure, 
such as the Kolmogorov-Smirnov or the chi-square tests. Funilly, 
the critical levels of the tests are often computed as if the data were 
independent, though the first claim is that they are long-range de- 
pendent; this would of course largely modify the critical intervals 
and even the speed of convergence of the test statistics. 

Our intuition is that the conclusions could have been different 
by adopting a perhaps more plausible assumption: the data are 
not stationary on a global scale but only locally i.e. stationary on 
a finer scale and non stationary on long time scales. It has been 
known ([6], [7], [8]) for years that deterministicjumps or trends in the 
mean of a time series without long-range dependence can mislead 
to the untrue conclusion that the series is long-range dependent if 
one relies on visual indexes of long-range dependence such as the 
variance time plot. 

To support our intuition we analyze a common traffic trace. 
It was recorded at Bellcore and it was originally studied by Pax- 
son and Floyd [9] who concluded that this stream exhibits a high 
degree of burstiness that can not be explained by any markovian 
model and who discussed how this burstiness might mesh with self- 
similar models of traffic. In Section 2 we test the stationarity of 
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the measured traffic on different time scales ranging from a few 
seconds to hours of traffic. In Section 3 we introduce the Markov 
modulated Poisson process and a new model that we propose after 
careful analysis of different traffic data. In Section 4 we consider 
the estimation of the parameters of the proposed model: both block 
and recursive procedures of estimation are detailed. In Section 5 
we prove by simulation that the abusive use of classical indexes 
of long-range dependence for non stationary series conducts to the 
false interpretation of long-range dependence. 



2 Tests for stationarity 

2.1 Principles 

2.1.1 General Framework 

Basically the tests of stationarity that we propose rely on the com- 
parison of different empirical statistics calculated on two neighbour 
segments of finite length of the stream. The hypothesis of station- 
arity is rejected if the empirical statistics for the two neighbour 
segments are significantly different. 

Denote by {Xt} the sequence of the inter arrival times (lAT) 
from which a set of finite length {Xt}i<t<T is observed. Suppose 
that one aims at testing if this finite observation is strict sense 
stationary. Denote by t\ the presumed change point. For the sake 
of homogeneity we also define tq = 0 and t 2 = T, We do as if 
{Xt}i<t<n {Xt}Ti+i<t<T were two realizations of finite length 
Ti = Ti — of two processes {X/} and {X^}, 

In what follows we test the stationarity of the lAT process in the 
sense of (i) the mean of the process (ii) the sampled cumulative dis- 
tribution function E{g{Xt)) where g{x) = (Hai (a:), • • • ,llA^(a;))', 
(Ai)i<i<AT being a partition of and (iii) the first covariance coef- 
ficients •• • ^ XtXt^M-i)')- We introduce a new time 

series {Zt} that is defined as (i) Zt = Xt (ii) Zt = g{Xt) or (iii) 
Zt = {Xf^ XfXt+i^ • • • , XtXt^M^i)' depending of the non stationar- 
ities that we want to detect. 
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2.1.2 A Central Limit Theorem 

Different Assumptions are needed to establish a Central Limit The- 
orem for the vector of the empirical statistics. 

Assumption 1 is strict sense stationary. 

Assumption 2 ^ Ci > 0. 

Assumption 3 {Xt] is a-mixing with an a-mixing coefficient that 
verifies ^ < +oo. 

Assumption 4 qt +oo and qt = o{t). 

Let us recall that the a-mixing coefficient of the process {A^} 
is defined as an = sup^^^ |F(Afl B) — P(A)P(S)| the supremum 
being taken on all sets A m and B m where M\ — 

a a <t <h). 

The Assumption 3 is verified by many usual processes and in 
particular by a large class of Markov processes. It is in particular 
verified by ARMA processes when the probability density function 
of the innovation is strictly positive on R and by any finite state 
irreducible Hidden Markov Chain. 

Denote hy ^ SI=ri_i+i empirical statistics for the 

segment of index i and denote by Zt = {{Zj^)\Zfyy the vector of 
the empirical statistics for the two segments. 

Theorem 1 Assume (A1-A2-A3). Then it holds that 

Vt{Zt - (11)' ® E(Zt)) ~ AAf{0, r), r=(^ 

where To = EtZoolzi'r) jz{r) = E(Z,Z;^,) - E(Z,)E(Z') 
and where A® B denotes the Kronecker product of A and B. 

Note that To is equal to the spectral density matrix of {Zt} at 
zero frequency. This remark permits the construction of a consis- 
tent estimator 

. +mr T T 

fo = ~ E 

0 1 1 
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where uit = VT and w{k) = lk=o + 

The Demonstration of Theorem 1 can be found in Appendix 1. 

2.1.3 Tests 

• Stationarity of the mean and of the marginal distri- 
bution 

As stated above the test for stationarity consists in comparing 
Zj and Zj. For the mean and for the marginal distribution 
of the process we consider the difference between Zj and Z|. 
on the two neighbour segments = UZt, where U = 

(1 — 1) in the test for stationarity of the mean of {X(} and 
U = (1 — l)(g)/;vin the test for stationarity of the marginal 
distribution of {Xt}. 

Theorem 2 Assume (A1-A2-AS). Then it holds that 

Vtuzt ~ AA/'(o, UToU'), rzT(r-i/")^r-‘/"ZT ~ x'(^) 

where r 2 denotes a square root ofT, F = F 2 (r 2 )'. 

• Stationarity of the first correlations 

In [10] Mauchly introduces the sphericity statistics to test 
whether two gaussian random vectors have the same covari- 
ance matrix. Drouiche and Mokkadem ([11], [12]) generalize 
this test as a test for spectral adequacy. We propose to use 
this measure of similarity to test whether a process is second 
order stationary. 

For any positive sequence p = (po^Pi,*** denote by 

Tn{p) the Toeplitz matrix Tn{p) = PrMr where Mr is 

the matrix whose entry (i, j) is equal to Mr{i^j) = Sr{\i — 
j\). For any two positive sequences fi and u the sphericity is 
defined as the ratio 

^ j^Tr(TMT^\u)) 
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Our idea is to derive the asymptotic distribution of S{Z^^ Z^) 
normalized by a factor that depends on the length T of the 
observation and to reject the hypothesis of stationarity if the 
obtained value is lower than a prescribed threshold deter- 
mined by the false alarm probability a. 

Theorem 3 Assume (A1-A2-A3). Then it holds that 

2ts*{Zt) '^(7) ly ® E{Zt))z, z ~ V(o, r) 

where 1)'® E(Z^)) denotes the Hessian of S at point 

The demonstration of this result is based on a Taylor develop- 
ment of S at point (E(Zt), E{Zt)). As S is maximum at point 
{E(Zt)^E{Zt)) a second order Taylor development is needed. 

To establish the expression of one needs the second order 
differential of M log \ M\ and of M M~^ : 

(dA,dB) -Tr{M-HAM-UB) 

{dA,dB) 2M-HAM~HBM-^ 



• Thresholds 

Theorems 2 and 3 permit to reject the set of Assumptions 
(A1-A3) with a false alarm probability of a. If the obtained 
statistics is superior to the (1 — a) quantile of the asymp- 
totic distribution one concludes that (A1-A3) is wrong which 
means that A1 and A3 are mutually exclusive. 

Note that in Theorem 3 the asymptotic distribution is a 
quadratic form in a multidimensional Gaussian random vari- 
able and that the prescribed threshold is obtained by Monte- 
Carlo simulation. 

2.2 Results 

The simulations are replicated for thirteen time-scales ranging from 
six seconds to one hour and thirty minutes and for ten pairs of 
neighbour segments for each time-scale. 
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Figure 1: Stationarity of the mean 



On the Figures 1 and 2 we plot T Zj for all 
the pairs of neighbour segments. The 90% and 99% quantiles of 
X^(iV) are represented in dotted lines. (A1-A3) is rejected when 

is superior to the (1 — a) quantile of x^(A^)- 
On the Figure 3 we plot the cumulative distribution function 
P{X < 2TS{Z^^ Z^)) for the asymptotic distribution 
X = Z^V^S \(E(Zt),E(Zt)) ^ where Z ~ A/"(0,F). The 90% and 99% 
fractiles for the distribution of X are represented in dotted lines. 
(A1-A3) is rejected if P{X < 2TS{Z]-^ Z^)) is superior to (1 — a). 

The conclusions of our simulations is that (A1-A3) is wrong for 
most pairs of neighbour segments for long time-scales. Classical 
models such as the stationary Poisson process or the stationary 
Markov Modulated Poisson Process are consequently not adapted 
to the traffic that we investigate on these long time-scales. 
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Figure 2: Stationarity of the marginal distribution 



Note that the Assumption A3 is wrong for long-range dependent 
processes such as the Fractional Gaussian Noise or the fractionally 
integrated autoregressive moving average process. Consequently 
the tests developed do not permit to reject A1 for long-range depen- 
dent processes. The difficulty to decide between long-range depen- 
dence and non stationarities has already been discussed by Duffield 
et aim [7], 

It is thus difficult to decide if the evidences of auto-similarity 
mentioned by many authors result from a real auto-similarity of the 
traffic or from some non-stationarities that might have mislead to 
the conclusion of auto-similarity ([6], [7], [8]) or from the coexistence 
of both phenomena. 
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Figure 3: Stationarity of the first five correlations 
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3 Locally stationary and semi-Markov 
models 

3.1 Vindication of the locally stationary and 
semi-Markov models 

As mentioned in Section 2 it is impossible to decide between the 
hypothesis of self-similarity and the hypothesis of local stationarity 
on the simple basis of the tests of stationarity that we have devel- 
oped. In what follows we favor the hypothesis of local stationarity 
by comparison with the hypothesis of long-range dependence. 

As one can see on Figure 4 the measured traffic is not signifi- 
cantly correlated if one considers different slides of traffic that last 
approximately ten seconds. This ten seconds time scale corresponds 
to the shortest period of reallication of the bandwith at the User 
Network Interface that one can afford; for shorter periods of reallo- 
cation the control automata would be overloaded [13]. A mixture 
of exponentials or of shifted exponentials seems to be a reasonable 
model for the marginal distribution of the inter-arrival times on this 
ten seconds time-scale. We think that the classical Markov mod- 
ulated Poisson process (MMPP;[2j) or a new model that we call 
the shifted exponential hidden Markov model (SEHMM) are both 
reasonable and tractable models of traffic on short time scales. 

A solution to model time series that are suspected to be locally 
stationary consists in proposing a parametric model whose parame- 
ters are varying slowly with time. On longer time scales we propose 
to model the measured traffic as a semi-Markov process whose pa- 
rameters are varying slowly with time. This solution has many 
advantages among which the existence of recursive procedures of 
estimation of the parameters of the process and the existence of 
queuing results for such input processes. 



3.2 The Markov modulated Poisson process 

The Markov modulated Poisson (MMPP) process is a generaliza- 
tion of the classical Poisson process that is commonly used as a 
model for teletraffic data. This model permits to a certain extent 
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to take the observed burstiness of the traffic into account while 
keeping in a markovian framework. For a review about the Markov 
modulated Poisson process we refer the reader to the tutorial paper 
of Meier-hellstern [2]. 

3.2.1 Representation of the MMPP as a DSPP 

The doubly stochastic Poisson process (DSPP) is a generalization 
of the classical Poisson process. In the case of the doubly stochas- 
tic Poisson process the intensity of the Poisson process is a non 
negative stochastic process. Denote by (A(i)) a continuous time 
non negative stochastic process, denote by (T^) the successive in- 
stants of arrival and denote by N{t) the associated count process 
N[t) == — ^)* suppose that Tq = 0 and A^(0) = 0. 

The doubly stochastic Poisson process is defined by 

(i)V0 < n < < t3,V(p,q) G N2, 

r{Nt,-Nt,=p,N,,-Nt,=q\J^o) ( 1 ) 

= P(iV,3 -Nt,=p\ -Nt,=q\ To) 

where > 0) is the complete filtration associated to 

(At)teR- 



{ii)\/t > s,\/n eN,F{Nt- Ns = n \ J^oo.Gs) 

^ exp(- J y 

where Qs = o-{Nt]t < s). 

The Markov modulated Poisson process is the particular case of 
the doubly stochastic Poisson process when the intensity process is 
a finite-state and continuous time Markov process 

yt > s, F(A, = A, 1 J^s) = F(A, = A, 1 A,) (3) 

where J^s = ^(A^; t < s). 

(Ai, A 2 , • • • , Xk) is the set of states of the intensity process and 
K is the order of the Markov modulated Poisson process. The 
most common parameterization of the Markov modulated Poisson 
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process is (Q> A) where Q is the stochastic transition matrix of the 
finite state continuous time Markov intensity process and where A 
is the diagonal matrix whose entry (z, i) is equal to A,-. The Markov 
modulated Poisson process is then referred to as the {Q, A) source. 

3.2.2 Representation of the MMPP as a MRP 

A Markov renewal process (MRP) is a discrete time stochastic pro- 
cess {Xn,Yn) such that (i) is a finite state Markov chain. 

(ii)p(y„ < = j I an-i) = r{Yn <t,x„ = j\ x„_i) (4) 

where Qn = cr{(Xm,Ym)’,m < n) is the filtration associated to 

(Xn,Yr,). 

The most common parameterization of the Markov renewal pro- 
cess is its stochastic transition matrix F{t) = with 

Fij{t) = P(X„+i = j,Yn+i < t \ Xn = i)- The transition matrix of 
the imbedded Markov chain is then P = F(+oo). 

Denote by Ui = Ti — Ti-\ the successive inter-arrival times and 
denote by A^ = Aj^ the state of the underlying Markov intensity 
process at time T*. We demonstrate in Appendix 2 that (A^, Ui) is 
a Markov renewal process with Markov transition matrix F{t) = 
f* exp((Q — A)u) duA. 

3.2.3 Representation of the MMPP as a HMM 

A hidden Markov model is a stochastic process (W„, V^) such that 
(i) (X„)„eN is a finite state Markov chain. 



(ii)p(y„ < 1 1 = z, a„_i) = P(y„ < 1 1 = i) (5) 

where On = (^((Xm,Ym)',m < n) is the filtration associated to 

(x„,y„). 

The terminology “hidden” stems from the fact that the imbed- 
ded Markov chain is not observed directly and that any 

statistical treatment should be based on the observation of (V^)„eN 
only. 
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The hidden Markov model is parameterized by the transition 
matrix of the imbedded Markov chain P = {Pij)i<i,j<i< where Pij = 
P(Xn+i = j I Xn = i) and by the either discrete or continuous 
distribution of Yn conditionally to Xn = i- 

Remark that any Markov renewal process (Xn,K,) of order K 
can be represented as a hidden Markov model of order K^. Denote 
by (Xn, Yn) a Markov renewal process with Markov renewal matrix 
F{t). Denote by Xn = (Xn-i,Xn) the successive transitions of the 
underlying Markov chain. Then (Xn, Yn) is a hidden Markov chain: 
the transition matrix P of the imbedded Markov chain (Xn)ncN is 
a structured transition matrix with entries equal to 

P(n,i2)(ji,j2) ~ 2«2=ii-^iii2(T-oo) 

and the cumulative ditribution function of Yn conditionally to Xn = 

ihj) is 

3.3 The shifted exponential HMM 

In this Section we briefly expose a new model that we have de- 
veloped after carefully analyzing different measures of traffic on 
broadband networks [9] [13]. 

On Figure 4 we display the spectral density function of the inter- 
arrival times and the histogram of the natural logarithm of the 
inter-arrival times for five distant frames corresponding roughly to 
ten seconds of traffic. The spectral density function is estimated us- 
ing either standard smoothed periodogram techniques or the Burg 
maximum entropy method [14, 15, 16, 17]. The logarithm trans- 
formation permits to visualize the distribution of the inter-arrival 
times in the region of short inter- arrivals. 

From a simple visual inspection of this plot it is seen that the 
spectral density function of the process evolves with time. It is note- 
worthy on the non parametric estimate of the sdf that the spectral 
density in the neighbourhood of zero frequency seems compatible 
with short-range dependence. Note that the maximum entropy 
estimate of the sdf cannot reveal long-range dependence: the max- 
imum entropy spectral estimation amounts to fit to the p first au- 
tocovariance coefficients of an autoregressive (AR) model and to 
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Figure 4: Histogram and power spectral density of five distant 
slides of traffic lasting ten seconds 

extrapolate the autocovariance coefficients after time index p as if 
the autocovariance sequence were that of an order p AR model. 
Thus the extrapolated autocovariance coefficients decrease as an 
exponential for large lags. It is also interesting to note that the 
difference between the maximum and the minimum of the spectral 
density function is somewhat reduced, yet data are not compatible 
with a Poisson process since the inter-arrival times are significantly 
correlated. 

The analysis of the histogram of the logarithm of the inter- 
arrival times and of its evolution also reveals interesting features. 
It seems that a mixture of exponential distributions with different 
offsets. Our model of marginal distribution is close to the model 
proposed by Kofman et a/. [18]. According to these authors who 
analyzed the traffic measured by Jain and Routhier [19] the distri- 
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but ion of the inter- arrival times is a mixture of exponentials. 

On the basis of these findings we propose to model the measured 
traffic as a hidden Markov model the distribution of Un 

conditionally to Xn — i being an exponential distribution with an 
offset time 5^- 



r{Xt = 1 1 = P{Xt = i I 

P(t/< eA\Xt = z, = L 

where = o-(Ps', ^5 < i) and by Qt = o’(Xg; s < t). 



4 Estimation of the parameters of hid- 
den Markov models 

In the previous section we have proposed different semi-Markov pro- 
cesses to model the traffic measured on broadband networks. One 
of these models (SEHMM) is a hidden Markov model. The sec- 
ond model (MMPP) can be represented as a hidden Markov model 
with structured stochastic transition matrix as we demonstrate in 
the previous section. In this Section we detail both block and recur- 
sive procedures of estimation of the parameters of a hidden Markov 
model. The block procedure is used on the time scales on which 
the measured traffic can be considered as stationary; the recursive 
procedures of estimation can be used to track the slow variations 
of the measured traffic on longer time scales. 

Remark that renewal processes such as the simple Poisson pro- 
cess are particular cases of hidden Markov models when the order 
of the model is K = 1] the procedures of estimation that we de- 
velop in this Section apply to renewal processes. We verify that the 
offsets do not vary significantly with time and we conse- 

quently do not consider the problem of the estimation of the offsets 
(«5 ^)i<kk; in what follows we assign the offsets (.s^)i<k/< a constant 
value. 
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4.1 Unconstrained parameterization 

One should remark that the parameters of the Shifted Exponential 
Hidden Markov Model are subject to the following constraints 

VI < ij < K Pij > 0 

yi<i<K Ef=i Pi: = 1 (7) 

and VI < z < K > 0 



These constraints must be taken into account in the optimiza- 
tion steps of the procedures of estimation that we expose in what 
follows. Another solution consists in proposing an unconstrained 
parameterization of the model and in using standard optimization 
techniques for the problem without constraints. The second solu- 
tion is retained in what follows. 

To overcome the constraints of positivity A^- > 0, 1 < z < /T one 
can substitue A^- for $(A^) where $ is a bijection from into R 
e.g. A, -> log A,-. 

The constraints concerning Pi^. = (P^i, • • • , Pik) are equivalent 
to setting Pij = {u'jY where = {u^j)i<j<K is a unitary vector of 
R^ that is to say verifies ~ This remark permits to 

parameterize Pi^: by {K — 1) Givens angles {a^j)i<j<K-i- 

Proposition 1 Any unitary vector u in R^ is the image vector of 
ex where e— (ei, • • • , e^) is the canonical basis o/R^ by a product 
of {K — 1) Givens rotations in R^" ; 

K-l 

^ = JJ Gk,K-i{o^k)[0 • • • 01]^, 

k=i 



0 ^ 7 /c < 27t and 
Gk,K{o^k) 



/ \ 

cos ak sin ak 

Ik^i 

\ — sinaA; cosaA: J 



represents the matrix of rotation of angle ak in the subspace ex) 
of 
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4.2 Off line estimation of the parameters 

In this Section we briefly recall the basic principles of the Expec- 
tation Maximization (EM) algorithm [20]. The EM algorithm is 
a constructive method of the maximum likelihood estimate of the 
parameters of a mixture or of a finite state Hidden Markov Model. 
Denote by {{Yt,Xt)}i<t<T a finite length realization of a hidden 
Markov chain. The basic idea of the EM algorithm consists in con- 
structing a sequence (^*:)fcgN that verifies Q{0k+i,0k) > Q{0k,0k) 
when Q{0i,92) = E{L{Xi:t,Yut-,6i) \ X^t = Xi:t,92)- A con- 
sequence of Jensen’s inequality [21] is that the likelihood of the 
observed {Xt}i<t<T is an increasing sequence 

L{xi.,T',9k+i) > L{xi:T]0k) 

and thus that it converges on a local maximum of the log-likelihood 
9 L{xi:T]9) since L(xi:t',9) is always inferior to zero. 

Each iteration of the EM algorithm can be decomposed into two 
steps. 

• Expectation Step: 

This step permits to get the distribution of the imbedded 
Markov chain conditionally to the observed Xi-,t = xi.,t 
with an algorithm of linear complexity. It is based on the 
recursive construction of two lattices 

at{i) = lP(A'i:(, Yt = i\ X^t = xur, h) 

(3t{i) — P(A(+i:r = xt^i:T) I Ai;x = xi./r] h) 

= J2j{xt)Pijf3t{j)IT.i,jPij(^t{j)Pj{xt) 

( 8 ) 

denotes the probability density function of Xt condition- 
ally to Yt = i. 

Straightforward manipulations conduct to 

7t(i) = F{Xt+uT = xt+uT \Yt = i; h) = at(0A(0 
0(0 = ^{Yt = i, Yt+i = j I XuT = Xi:T] h) (9) 
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• Maximization Step: 

This step consists in maximizing with respect to 

6. Note that Q{0,Ok) can be decomposed into two terms 
Q{9, Ok) = Qi{0, Ok) + Q2(0, h) where 



Qx{0,h) = nL{Y-0)\X = x-h) = T.Mpia,{i) 

+ Ei,iiog^oEr=2Ct(*,i) 

Q 2 { 0 A) = nL{X\Y-, 0 )\X = x-k) 

= Ei7«(0(log^i - - 5i)) 

(10) 

depend separately from {Pij)i<i^j<K ^nd from (A^)i<k/<. 

Q{6^ 0k) can thus be maximized separately with respect to 
{Pij)\<i,j<K and with respect to : 



p(A:+l) _ 



X^t=2 Ct(hj) 

Ej Et Cth'J) 

Et7t(i) 

Et7t(d(^t-«i) 



( 11 ) 



Note that for these formulae of actualisation the constraints 
(7) are automatically verified and though that one can use the 
most common parameterization 0 = ((Pij)i<zj<K^ (A^)i<i<K)* 



Leroux [22] proved the consistency of the maximum likelihood 
estimator under mild conditions in the case of hidden Markov mod- 
els and later Bickel and Ritov [23] proved the local asymptotic nor- 
mality of this estimator. 



4.3 On line estimation of the parameters 

Procedures of recursive estimation of hidden Markov models pa- 
rameters have been considered by Holst and Lindgren [24], Krish- 
namurty and Moore [25], Ryden [26] and Mevel [27]. In the papers 
of Holst et a/.and of Lindgren et a/.no convergence results are es- 
tablished although simulation studies show that their algorithms 
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often work out well in practice. Ryden and Mevel prove that their 
recursive estimator converges to the set of minima of the Kullback- 
Leibler divergence and is asymptotically normal under suitable con- 
ditions. In this Section we expose the procedure of Mevel which is 
used in the sequel to follow the slow variations of the parameters 
of the non stationary hidden Markov model. 

Note that the log-likelihood can be decomposed into 

T T K 

L{xi:t;0) = y^logj9(xf I Xi:t-i',6) '= log bj{t] 0)pi{xt; 9) 

t=l t=l i=l 

( 12 ) 

where Pi{^] 0) denotes the probability density function of Xt condi- 
tionally to Yt = i and where 



bi{t] 9) = P(Ft = i I = xut-u 9) (13) 



denotes the one-step ahead prediction filter. 

Mevel proposes an algorithm of linear complexity to approx- 
imate the one-step ahead prediciton filter as well as its gradient 
w.r.t. 9 



bi[t -f 1; 0) 



Pjii^)Pjixt;9)bj{t]9) 
'Eij Pjii^)Pj{xt]9)bj{t]9) 



F{b,{t;9),xt) 



(14) 



and similarly 

—k{t + 1; 0) = G{b.{t; 9), -b,{t- 9),xt) (15) 

Mevel proves that one can estimate the true parameter 6tr by 
looking for an at least local minimum of the Kullback-Leibler di- 
vergence K{6) ^'1= limn-^.+oo ^ log with a recursive esti- 

mator 



o 

9t = PG{9t-i + Kt-^logp{xt 1 xi;i_i;6><)) (16) 

where G is a closed, bounded and convex subset of the param- 
eters space and Pq is a projection on this convex set and where 
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where | < a < 1. Note that in the non stationary 
context of tracking of the slowly varying parameters then 7^ = 7 
and none of the convergence results mentioned here above holds. 

The lattice procedure of estimation of the one-step ahead pre- 
diction filter and of its gradient permits to estimate 

^logp(i, I = (log *'))■' 

Another solution consists i in replacing the most common con- 
strained parameters with the unconstrained parameters 
e = {{a))i<i<K,i<j<K-ui^(>S^i)i<i<K) as exposed in Section 4.1 so 
that no projection is needed. 



5 Indexes for self-similarity 

5.1 Variance time plot 

Denote by a series with long-range dependence i.e. 

R{k) = 0.5<H < 1.0 (18) 

where R(k) = E((X^ — X){Xt-{.k — ^)) denotes the autocorrelation 
of lag k of {Xt} and where L(x) is a slowly varying function at 
infinity that is to say L{rx) I L{r) 1 for all x > 0. 

It is well known [28] that if denotes the aggregated se- 
ries Xl^^ = ^ sample variance varX^^^ of the 

aggregated series is varX^’^) ^ as m — > -foo. This per- 

mits the construction of a visual index of long-range dependence. 
One plots logvarX^^^ versus logm for various aggregation levels. 
If the series is long-range dependent the graphic fits a straight line 
with a slope — 1 < /? = 2H — 2 < 0. The slope of this straight line 
provides an estimate of the Hurst parameter H. 



5.2 R/S analysis 

Another classical result is that a series with long-range depen- 
dence satisfies to the Hurst effect. Consider X\:n a finite set of 



68 





Figure 5: Variance-time plot 

observations of length n. Denote hy X = ^ Xk and S = 

^ l<k<n 

n Si<fc<n(^*: ~ sample mean and the sample variance of 

Xi : n; then the range of Xi;„ is defined as /? = max(0, Wi, - ■ ■ , Wn)— 
min(0, Wi,--- , W^) where Wt = 1 < f < n. 

The series is said to be self-similar if the rescaled adjusted range 
E((i?/5)(n)) is 

E(f H) cn^ c> 0,0.5 < /f < 1.0 (19) 

This phenomenon is known as the Hurst effect. Broadly speak- 
ing this result means that the integrated process visits more space 
when the increment process has long-range dependence than when 
it does not have any long-range dependence. This phenomenon 
was obsedrved by Hurst [29] in 1951 in a study on the Nile floods. 
This result permits the construction of a second visual index of 



69 









Figure 6: 









long-range dependence. One plots \ogE{R/S) versus logn and the 
series is self-similar if the graphic fits a straight line with slope 
0.5 < H < 1.0. The slope of this straight line provides an estimate 
of the Hurst parameter H. 

Other procedures of estimation of the Hurst parameter are peri- 
odogramm regression techniques [30] and Whittle’s maximum like- 
lihood estimate. 

5.3 Results 

Our intuition is that the apparent self-similarity that many authors 
observe may be due to some non stationarities in a time series with- 
out long-range dependence. To support our intuition we simulate 
different non stationary and semi-Markov traffics whose parame- 
ters are matching the estimates of the parameters of the real traffic 
and we visualize the variance time plot and the R/S indexes for 
self-similarity of the traffic that we simulate. Three different non 
stationary models are proposed: the non stationary Poisson pro- 
cess, the non stationary Markov modulated Poisson process of order 
2 (MMPP(2)) and the non stationary shifted Exponential hidden 
Markov model of order 4 (SEHMM(4)). 

We display on Figures 5 and 6 the results of the analyses for the 
real traffic [9] and for the three synthetic locally stationary synthetic 
traffic without self-similarity. As one can see on Figures 5 and 6 if 
one relied on the simple visual indexes for self-similarity one would 
conclude that the synthetic traffics with no self-similarity but with 
some non stationarities are self-similar. The estimates of the Hurst 
parameter that one deduces from the variance time analysis and 
from the R/S analysis are close if one takes into account the bad 
quality of these estimators. 



6 Conclusion 

In this contribution we have demonstrated that the apparent self- 
similarity of the traffic measured on broadband networks might 
stem from some undetected non stationarities in a traffic with- 
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out self-similarity rather than from a real self-similarity of the 
trafffic.This is a major finding since that means that one can take 
advantage of the numerous results that have been obtained for years 
for semi Markov traffics. On the contrary very few is known if the 
real traffic happens to be effectively self-similar. In the future we 
plan to support our findings by intensive simulation on other mea- 
sures of Ethernet traffic and of high bit rate video data. 

A Demonstration of Theorem 1 

The demonstration of Theorem 1 requires to establish the following 
Central Limit Theorem 

Lemma 1 Assume (A1-A2’A3). Then it holds that 



E Z.-E(Z.))~AV(0,c-‘ro), l<i <2 

where To = Et=-ooMr) with 7^(r) = - E{Zt)E{Z'^) 

The demonstration of Lemma 1 can be found in for example 

[ 21 ], 

To prove Theorem 1 we mimic the approach of Epps [31]. We 
define a new estimator where the first qx terms are removed 

E Z,. 

The basic idea consists in proving that \/TZj and y/TZ^ converge 
in distribution to the same normal distribution and in proving that 
VfZ^ and VTZ^ are asymptotically independent, in the sense that 

It results from the Davydov Theorem [21] that 

< a{qT) 0 
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and though Z\ and Z|. are asymptotically independent. 

Denote by DV = ~ ^ Eter-!,+i the differ- 

ence between \/T{Zj and y/TZj. It results from (A2-A4) that the 
covariance matrix of tends to zero as T tends to infinity. It 
then results from Lemma 1 and from the Slutski Theorem [17] that 
\/T{Zt — (1,1)^ ® E('^t)) ~ AA/’(0,r) which concludes the proof 
of Theorem 1 . 



B Representation of the (Q, A) source as 
a Markov renewal process 

Consider the (Q,A) source introduced in Section 3.2. We are going 
to demonstrate that this (Q,A) source can be represented as a 
discrete time Markov renewal process with Markov renewal matrix 
F(t) — f*exp((Q — A)u)duA. For that matter we closely follow 
the demonstration of Meier-Hellstern [2]. 

Denote by M{t) the matrix whose entry {i,j) is equal to Mij{t) = 

r{N{t) = Q,\t = \j\ Xo = \i). 

M{t) satisfied to the following Chapman-Kolmogorov equation; 



Mij{dt) = Y,kjtj^ik{t)Qkjdt + \j)dt 

Mij{dt) = Y^^Mik{t){Qkj- Akj)dt 

M'it) = M{t){Q - A) 

with M(0) = Ik so that M{t) = exp((Q — \LaTnbda)t). 

Remark now that 

Fij(^dt) = P(Un^\ G dt, An+i = Xj I Aq = A,) = Mij(t^Xjdt 
that is to say that 

F'{t) = M(t)A = exp((Q — A)t)A 

with the initial condition F(0) = 0 and though F[t) = exp((Q — 
A)u) duA = (A — Q)~^{Ik — exp(((5 — A)x) A. 
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Abstract 

This paper develops a new threshold-based traffic characterization which has 
immense flexibility together with the ability to be very specific. It considerably 
generalizes the moving window and peak rate allocation schemes. A policing 
mechanism based on this characterization has a simpler implementation than the 
Leaky Bucket and can offer advanced services. The characterization is based on 
traffic measurements and an easy-to-implement procedure is developed to obtain 
parameters directly from user data. The characterization can be used in a "cold 
start" mode and quickly adapted as more data becomes available. 



1 INTRODUCTION 

The Asynchronous Transfer Mode (ATM) technology is being developed to 
support multi-service high-speed communication networks. The stringent 
performance desired of networks, such as the ATM network, requires that user 
traffic be characterized for the purposes of 




(a) allocating network resources, 

(b) call admission control, 

(c) usage parameter control or policing of user traffic, and 

(d) charging. 

Currently, user traffic is specified in statistical terms such as the peak rate, the 
average rate, and perhaps the largest burst size allowed (see, for example, Rathgeb 
(1993), Tutofor (1990)). Deterministic rule based traffic descriptors are used for 
monitoring and policing (Doshi (1994)). Many authors have noted the inadequacy 
of some of these characterizations (Butto (1991), Rathgeb (1992, 1993), Tutofor 
(1990)). In particular, it is common knowledge that the average rate cannot be 
measured or policed properly (Rathgeb (1992), Tutufor (1990)). Only the peak rate 
lends itself to easy monitoring and allocation of network resources. However, the 
network capacity is underutilized if a peak rate based characterization is used. The 
higher the peak rate compared to the average rate, the more severely the networks 
resources are underutilized. 

The Leaky Bucket (Turner (1986)) is a device widely used for policing. It has two 
main parameters - the token pool size and the drain rate - and this limits the type of 
control that can be exercised over user traffic. It also limits the user’s ability to 
accurately describe its traffic. Uncertainty at call setup and infrequent but large 
traffic spikes (Rathgeb (1992, 1993), Tutufor (1990)) make it difficult to properly 
dimension the Leaky Bucket. The Leaky Bucket parameters must be chosen 
conservatively to guard against excessive cell losses. Commonly assumed user 
traffic patterns do not pass transparently through a Leaky Bucket. For example, the 
exponentially distributed ON-OFF pattern (see, for example, Choudhury (1996)) 
and the more general A^-state Markov chain (see, for example, Elwalid (1993)), go 
through a reasonably sized Leaky Bucket with large cell losses (Johri (1995)) 
relative to the desired network cell loss rate of ~10 . Further, widely differing 
user traffic patterns can be compliant with the same Leaky Bucket. The network 
will have to provide enough capacity for the "worst case" pattern coming out of the 
Leaky Bucket. This is, in some cases, but not always (Doshi (1994)), a 
deterministic ON-OFF traffic pattern and is again a conservative assumption. 

Other commonly known policing mechanisms include the jumping window and 
moving window schemes (see, for example, Rathgeb (1993)). It is shown in 
Rathgeb (1993) that, from a performance perspective, the Leaky Bucket is superior 
to both these schemes. 

The previous discussion points to a strong need for a new policing mechanism to 
overcome the shortcomings of the Leaky Bucket. The main requirement is to more 
closely relate allowable traffic patterns to the parameters of the policing 
mechanism. This requires a new way of characterizing user traffic. In addition, a 
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simple mechanism is required for adapting the policing mechanism if it is too strict 
or too lenient. In this paper, we focus on measurability, adaptability, and 
flexibility, to develop a new threshold-based characterization scheme (TCS). This 
is a pattern based characterization and will permit more precise specification of 
allowable user traffic patterns. We also develop an easily implementable policing 
mechanism based on TCS. It is important to note that we develop TCS in its most 
general form. Simple versions of TCS, with just a few thresholds, will suffice for 
single users. TCS will use more parameters for services linking networks or 
involving a large amount of traffic. 

This paper is organized as follows: In §2 we highlight the need for policing. In §3 
we show a typical user data stream which forms the basis of TCS. In §4 we 
develop TCS and discuss its properties. In §5 we develop a new policing 
mechanism based on TCS, provide a simple procedure for measuring traffic 
descriptors from user data, and address issues related to overdimensioning, 
sensitivity, and characterization under uncertainty. Finally, §6 contains the 
conclusions. 



2 THE NEED FOR POLICING 

Accurately characterizing user traffic is not an easy task. This is because the 
network will carry traffic from multiple users as well as other networks and may 
itself offer services, such as video services, which support user applications for 
which the traffic and demand is unclear. One common view is that the network (or 
the user) may never know enough about the user’s applications to adequately 
characterize the user’s traffic. Moreover, with high bandwidth more readily 
available, the network can afford to offer loosely policed and, consequently, 
inefficient services for carrying such traffic. The INTERNET provides this type of 
service. There is a per site access charge with users allowed to transmit as much as 
they like. 

Without adequate policing, the network opens itself to exploitation. Suppose two 
users subscribe to a service at a rate of X dollars per month (see Figure la). This 
service is not adequately policed allowing the users to send as much traffic as they 
like. 

A third entrepreneur can subscribe to the same service (see Figure lb). This third 
user can now offer to transmit the previous two users’ traffic at a reduced rate of 
0.75X dollars per month, by multiplexing their traffic and sending it over its access 
line. As far as the original users are concerned, their traffic is going over the same 
network with the same performance guarantees as before but at 25% lower cost. 
The network, however, has lost 50% of its revenue while still carrying exactly the 
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same traffic. The third user may need to provide some buffering or may have to 
subscribe to a slightly higher rate service to pull this through, but the profit would 
still be worth it. 



Figure la: "Naive” users with loosely policed services 




Figure lb: "Smart" user reselling access 




This example is not far fetched. The companies reselling INTERNET access are a 
prime example of such entrepreneurs. The only way to prevent this is to not allow 
the same traffic permitted by two X dollars per month services on a service which 
costs substantially less than 2X dollars per month. In other words, to effectively 
police user traffic. 

A contract is generally established between the user and the network specifying the 
traffic descriptors, the policing mechanism and the performance guarantees. The 
network may have no additional information about user traffic and, thus, has to 
infer its characteristics from the traffic descriptors specified in the contract. 
Consider two services with identical Quality of Service guarantees. The traffic 
descriptors in the first service allow a wide variety of traffic patterns while they are 
more limiting in the second. The network will have to devote more resources 
(buffers and/or bandwidth) to the first service than to the second, so that quality is 
assured even with the "worst case" user traffic pattern. Consequently, if services 
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are priced according to the amount of network resources used, the first service will 
be more expensive. 

The more precisely the network can specify the allowable user traffic patterns from 
the traffic descriptors, the more efficiently it can allocate its resources. This leads 
to lower operating cost but does not solve the problem that it is difficult to 
precisely characterize user traffic. However, it is important to realize that the latter 
is the user’s responsibility. The network only needs to offer a variety of services, 
with lower prices for more precise specification of traffic descriptors. The user 
now has the option, for a given service quality, to choose vague traffic descriptors 
and pay higher prices, or choose precise traffic descriptors and pay lower prices. 



3 A SAMPLE USER TRAFFIC STREAM 

Several real examples of encoded video traffic were considered in Rathgeb (1992). 
One data set comprised of 171,000 frames generated at the rate of 24 frames per 
second. It was seen that the number of bytes per frame was highly variable and 
that there were numerous spikes in the data. The number of bytes produced by the 
encoder output was also recorded in fixed time intervals. For example, each frame 
could be "sliced" into 30 fixed time intervals. The burstiness in the slices was 
greater than in the frames. 

For our purposes, it is sufficient to consider that a user produces a stream of cells 
(or packets, or frames, etc.) over time and that it is possible to measure and control 
the number of cells in fixed time intervals of length t. The value of t can be 
negotiated between the network and the user and can be larger than or equal to the 
smallest time interval over which measurements are available. The traffic pattern, 
for example, may look like the one shown in Figure 2. This data is hypothetical 
and not taken from Rathgeb (1992). Figure 2 shows that 20 cells arrived in the first 
time interval, 30 cells arrived in the second time interval, and so on. Let N denote 
the number of fixed time intervals for which user data is available, and let C/ denote 
the number of cells in interval /, / = 1 ,2,...,A^. Thus, in the Figure 2, N = 100 and 
Cl = 20, C 2 = 30, and so on. The entire data stream is denoted as the sequence 
{ C / . The maximum number of cells that can be produced in any time interval 

depends on the access line rate (or a negotiated peak rate) and is, say, 100 for the 
data in Figure 2. This traffic pattern will be used for illustrative purposes only. 
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Figure 2: A Typical User Data Stream Displayed 
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4 A THRESHOLD-BASED CHARACTERIZATION SCHEME 

This section develops the threshold-based characterization scheme (TCS). TCS 
tries to capture the heights and frequency of occurrence of the various peaks in the 
data. 

TCS characterizes (and polices) traffic over a moving window of n time intervals, 
where n is a positive integer. The value of n is negotiated between the user and the 
network and can differ from user to user. 

For example, n could be chosen as 10. Now, the first moving 
window consists of time intervals 1,2,..., 10, the second of time 
intervals 2,3,..., 1 1 and so on. 

The next step is to define the heights of the peaks. Again, to keep TCS very 
flexible, we allow a set of m thresholds, where m is a nonnegative integer. The 
value of m and the value of each individual threshold is also negotiated between the 
user and the network and can vary from user to user. Alternatively, the network 
can offer several ready-made solutions for widely used applications. Let 
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h = denote the values (in number of cells) of the m thresholds 

chosen. Further, threshold h^ + i is always taken to correspond to the access line or 
peak rate, that is, it represents the maximum number of cells that can arrive in any 
time interval. The thresholds hj must be increasing in j, that is, 
0 < hi < h2 < • • • < hfn < ^m + l- 

For example, m could equal 5 and a set of relevant thresholds can be 
h = (0,20,40,60,80). In addition , 5 = 100. 

The main idea in policing is to place limits on how much traffic is allowed in. This 
is done in TCS is terms of the maximum (number of) violations of each threshold 
allowed in the moving window of n time intervals. Here a violation implies that 
the threshold is exceeded. Let v = (vj,V 2 ,...,v^) denote the vector of 
maximum violations of thresholds h = (/ii,/i 2 »-»^m)» respectively, allowed by 
TCS. By definition, threshold h^^ + i is violated 0 times. We have assumed that the 
peak rate is either limited by the access line rate or is being policed separately. 
Since the thresholds hj are increasing in j, the maximum violations Vj must be 
nonincreasing in j to be valid. 

For example, we can specify v = (10,8,6,4,2) to impose the 
requirement that in any 10 consecutive times intervals, threshold 80 
will not be exceeded more than 2 times, threshold 60 will not be 
exceeded more than 4 times, and so on. An easily implementable 
procedure is devised to measure these numbers from user data in 
§5.1. For the data in Figure 2, the threshold 80 is violated a 
maximum of 1 time (in any window which includes the interval 14 
or the interval 27) and threshold 60 is violated a maximum of 3 
times (in windows which include intervals 24, 27, & 33). 

Threshold Characterization Scheme : TCS consists of the time interval duration 
t, the window size n, the number of thresholds m, the threshold values h and h^n + i, 
and the maximum violations v. 

4.1 Advantages of TCS 

The most important advantage is that TCS is based on user data streams. All 
parameters can be directly measured as well as observed visually. This allows a 
seamless transition from data measurements to policing. The measurements show 
how the data is varying and the policing is based on limits placed on the amount of 
variation allowed. The fact that both measurements and limits are in terms of the 
same quantities makes TCS very intuitive and easy to understand. This is not true 
for the second Leaky Bucket parameter - the token pool size. It is supposed to 
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represent the maximum burst size allowable, but whether an actual burst of this 
size will be allowed in or not depends on the token pool state at the time of the start 
of the burst. Consider a Leaky Bucket with a drain rate of 1 cell/msec and a token 
pool size of 100. A burst of 100 cells at the peak rate of 2 cells/msec will be 
transmitted over 50 msecs. Consequently, cells will be lost unless the token pool 
has at least 50 tokens at the start of transmission, since only 50 tokens will be 
generated during transmission of the burst. This is highly unpredictable and is the 
factor which makes this Leaky Bucket parameter undesirable. 

Another important advantage of TCS is that it is very flexible, in that any number 
and values of thresholds can be chosen, and yet can be made as precise as desired. 
This is, of course, at the expense of having a larger number of parameters. The 
Leaky Bucket, on the other hand, cannot be made precise. 

As transmission speeds continue to increase, the time to transmit a cell becomes 
smaller and smaller. Policing can either be done cell by cell, as with the Leaky 
Bucket, or over small time intervals by limiting the number of cells allowed in 
these intervals. In both cases, it is imperative that the policing mechanism be very 
simple to implement and quick to run. This is, in fact, the most attractive feature of 
the Leaky Bucket and it is natural to expect that a policing mechanism based on 
TCS will be much more complex. In section 5, we will first develop a policing 
mechanism and show, contrary to expectation, that it is indeed quite simple. 

4.2 Generalizations of TCS 

The maximum violations in TCS control the number of times peaks can occur in 
any window. A user can still bunch the peaks together within a window. 
Additional traffic descriptors can be introduced to control the spacing between the 
peaks. In a generalized TCS, we do this by specifying the minimum number of 
time intervals which must separate two successive violations of the same threshold. 

For example, we can impose the requirement that any two 
successive violations of threshold 80 must have at least 10 time 
intervals separating them, and so on. 

A variation of TCS can be developed using the jumping window scheme. The full 
development of these generalizations is deferred for the future. 

4.3 Generality of TCS 

TCS can be used to police the peak rate. Let x be the number of cells 
corresponding to the peak rate over a time interval. Then, threshold can be set 
to X with = 0. This will prevent more than x cells being transmitted in any 
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time interval. Note that this is not identical to policing the peak rate with a Leaky 
Bucket type mechanism. 

With n = 1 and m = 1, TCS reduces to the Jumping Window (Rathgeb (1993)) 
scheme with a window of duration t. TCS also generalizes the Moving Window 
scheme (Rathgeb (1993)) by allowing much finer granularity of control within each 
window defined by the various thresholds instead of a single number dictation the 
maximum allowance in each window. 



5 BASIC TCS-BASED POLICING MECHANISM 

This section describes a policing mechanism based on TCS. The following 
variables are used: 

B isanmx(n-l) matrix of bits, used to monitor violations of the m thresholds 
over the last n-l time intervals with Bij = I if threshold /i, is violated in the 
j-ih previous time interval and 0 otherwise. 

u is a vector of m integer variables used to monitor the total number of 
violations of threshold h in the last n-l time intervals. Thus, u , = sum of 
elements in row i of matrix B. 

The variable u is not necessary and the same information can be derived from B. 

The basic policing scheme is as shown in Figure 3. The policing mechanism 
cycles through steps 1-3. An alternate step 4 can be included to precalculate 
certain information which allows this procedure to run even faster. Step 2 is where 
all the policing is done. Note that this is the simplest step in the whole procedure 
and even simpler than the corresponding policing step in the Leaky Bucket 
mechanism. The other steps simply update the variables of the policing 
mechanism. Step 4 is performed in parallel with step 2. 
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We now illustrate how the steps 0, 1 , 3 and 4 are performed. Assume the following 
TCS parameters: m = 5, n = 10, h = (0,20,40,60,80), = 100, and 

V = (8, 6, 3, 1,1). 

Step 0: Initialize Variables 

SoXBij = 0, and m, = Ofor/ = l,2,...,m andy = 1,2,...,«-1. 

Assume that the current values of the variables are given below. These values are 
not based on the data in Figure 2. 

101101101 

001101100 

B = 000100100 u = (6,4,2, 1,0) 

000100000 

000000000 
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Let the current time interval be 10. The value of B indicates that threshold 5 (=80) 
was not violated in time intervals 1-9, threshold 4 (=60) was violated only in time 
interval 4, and so on. The value of u indicates that threshold 1 (=0) was violated a 
total of 6 times in time intervals 1-9, threshold 2 (=20) 4 times, and so on. 

Step 1: Determine Highest Applicable Threshold in Next Time Interval 

In a "C”-like language, this is easily accomplished as follows: 

k = 

While (Uk < Vk & k < m) ++k ; 

The resulting value of k (=k) yields the highest applicable threshold hj^. 

For the variable values given earlier, k = 4 and threshold /14 (=60) is the highest 
applicable threshold in the next time interval. 

Step 3: Monitor Violation and/or Update Variables 

Let threshold d (< k) be the highest threshold violated. 

The value of d depends on the number of cells transmitted in the last interval. 
Update the variables as follows: 

B Delete column n - 1 . 

Shift other columns to the right. 

Set^ji = lifi<d and 0 otherwise 
u row sum B 



Assume that 35 (< h^ = 60) cells were transmitted in the last time interval. Then, 
d = 2, and threshold h 2 (=20) is the highest threshold violated. The new values of 
the variables are given below. 



B 



110110110 

100110110 

000010010 

000010000 

000000000 



u = (6,5,2, 1,0) 



After each time interval, steps 3 and 1, in that order, yield the highest threshold 
applicable for the following time interval. The highest applicable threshold with 
the new variable values is again / 14 . 

Step 4: Precalculate All Possible Values of Highest Applicable Threshold 

There are k - 1 (=3) possible values of d the next time Step 3 is evaluated. It is 
easy to precalculate the highest applicable thresholds (repeat calculations in Step 3) 
for each of these possibilities. These are k = 4, 2 and 2, corresponding to = 1, 2 
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and 3, respectively. With precalculations, steps 3 and 1 become very simple. Step 
4 is the only complicated step. However it is done in parallel with step 2, and the 
whole duration (0 of the time interval is available for performing step 4. 

The actual policing in Step 2 is done with just a counter, since cells are allowed in 
up to cells and then discarded (or marked and let in). The counter mechanism 
leads to a simpler implementation than the counter and timer required for the Leaky 
Bucket. The full duration (t) of the time interval is available for step 4. Hence, the 
number of thresholds etc. has little impact on complexity. Moreover, advanced 
features can be built on this basic TCS based policing mechanism (see §5.7). 

5.1 Measuring Threshold Violations 

Let V y {x) denote the indicator function of the event {x > y), that is, (x) equals 
1 if X > y and 0 otherwise. Then, V hj(^i) indicates if the j-th threshold is violated 
in the i-th time interval,y= l,2,...,m, / = 1,2,...,V. The sequence {V/^^(c/) is 
a sequence of O’s and I’s, that is, a bit stream. A simple program, perhaps 
implemented in hardware, can produce these m bit streams. 

The observed violations Vy of threshold hj can be calculated with the following two 
steps: 

n 

Step a: count = v, = 

i = l 

Step b: For (/ = n + 1 ; i < A ; ++/) 

count = counts - V;,.(c,_„); 

If (vy < count) then Vy = count ; 

For the data in Figure 2, we obtain V = (10,9,3,3,1). 

5.2 Maximum Allowance 

The peak rate based allocation can be very wasteful. A total of 1,940 cells are sent 
in by the user in the 100 intervals in Figure 2. If the user was allowed a peak rate 
of 100 cells in each time interval, the user could have sent a maximum of 10,000 
cells. Hence, less than 20% of the peak allowance is actually used and more than 
80% of the capacity is wasted with a peak rate based allowance. 

TCS is a subpeak based allocation and, hence, should lead to reduced 
overdimensioning of the network. The maximum number of cells allowed by v 
over a window of size n is given by 
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m 

M'(v) = + , + /i,(n - Vi) + X - ^j) 

7 = 2 

since the user can send cells no more than times, cells no more 
^m-i ~ times, and so on. With v = (10,9,3,3,1), the user can send in no 
more than w(v) = 520 cells in 10 time intervals. This allocated capacity is 52% 
of the capacity allocated with a peak rate based allocation. Dividing w(v) by n 
yields the maximum allowance on a per time interval basis. 

5.3 Sensitivity to Tunable Parameters 

As mentioned earlier, the parameters t, n, and m, and thresholds h, h^n + i, can be 
tailored by the network to suit each user’s needs. In Table 1, we have calculated v 
for various values of these parameters for the data in Figure 2. Next, we calculated 
w{y)/n assuming v = v. The smaller the value of w(v)/n, the smaller the 
overdimensioning. Recall that w(v)/n = 100 corresponds to the peak rate. As can 
be seen from Table 1, larger values of m and n lead to smaller overdimensioning 
and a characterization which is closer to the actual user data distribution. 



Table 1 : Various TCS Characterizations of the Data in Figure 2 



m 


h 


n 


V 


w(\)/n 


5 


(0M40M80) 


5 


(5,5, 3,2,1) 


64 


5 


(0M40M80) 


10 


(10,9,3,3,1) 


52 


5 


(0,20,40,60,80) 


20 


(18,15,6,4,2) 


45 


10 


(0,10,20,...,70,80,90) 


5 


(5,5,5,5,3,2,2,1,10) 


58 


10 


(0,10, 20,..., 70, 80,90) 


10 


(10,9,9,6,3,3,3,2,1,0) 


46 


10 


(0,10,20,...,70,80,90) 


20 


(18,15,15,11,6,5,4,3,2,0) 


40 



5.4 Adapting TCS in Continuous Mode Operation 

Since it is very easy to observe the actual number of threshold violations v, it is 
also easy to continuously adapt TCS to achieve a desired level of cell losses. If 
there are too many (few) cell losses then the value of v can be increased 
(decreased) incrementally. At all times it is necessary to maintain 

V > V 
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5.5 Characterization Under Uncertainty - Building in Slack 

To account for uncertainty in user behavior or insufficient user data, a slack or 
safety net of any desired size can be built into TCS. For example, we may have 
measured v = (10,9,3,3,1) but want to allow more traffic than what has been 
observed so far. We can start with v larger than v by any desired amount, say 
V = (10,10,5,5,2). Now w(v) = 520 while w(v) = 640 and this v has a 23% 
slack built in. 

5.6 Cold Start 

With the adaptation mechanism defined earlier, it is possible to use TCS in a cold 
start mode, that is, with absolutely no information available about user traffic 
characteristics. We can start with v = (n,n,...,n) allowing peak rate traffic or a 
smaller value provided it has sufficient slack built into it to account for unexpected 
situations. Now, as data is collected, v can be decreased progressively. 

5.7 Advanced TCS-Based Policing Mechanisms 

The number of variables and how they are updated can be easily changed to 
implement many advanced features. 

Policing can be based on dual (or even multiple) TCS characterizations of user 
data. Thus, occasional data spikes can be characterized using a large window size, 
while the remaining data is characterized with a smaller window size. For 
illustrative purposes, let all cell counts in Figure 2 higher than 65 be considered as 
spikes in the data. There are four such counts in intervals 13, 24, 27 and 33. Thus, 
the primary TCS characterization will have a maximum threshold of 65. The 
spikes will have to be admitted with a secondary TCS characterization. We can 
choose these as follows: 

Primary TCS: = 10, mi = 4, hi = (0,20,40,65), Vi = (10,9,3,0). 

Secondary TCS: «2 = 100, m 2 = 2, h 2 = (0,20), V 2 = (4,0). 

Recall that for the data in Figure 2, we had measured v = (10,9,3,3,1) 
corresponding to h = (0,20,40,60,80). Our new characterizations correspond to 
these measurements. 

The policing mechanism is changed to allow the sum of the two characterizations. 
The primary TCS allows up to 65 cells a maximum of 3 times in every 10 time 
intervals, etc. The secondary TCS allows an extra 20 cells to be admitted, over and 
above what is admitted by the primary TCS, a maximum of 4 times in 100 time 
intervals. The primary TCS allows 45.5 cells per time interval on the average. The 
secondary TCS allows 0.8 cells per time interval on the average. The two together 
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allow 46.3 cells which is significantly less than the 52 cells allowed by 
V = (10,9,3,3,1) originally. Thus, using multiple characterizations reduces 
overdimensioning. 

A variation of the dual scheme is to let the second TCS characterization denote the 
number of (marked) cells which will be admitted once the first TCS 
characterization is exceeded. 

Another variation is to charge more for admitting under the second TCS 
characterization. 



6 CONCLUSIONS AND FUTURE WORK 

The types of user traffic characterization used currently have been shown to have 
several deficiencies. The peak rate based characterization is enforceable but leads 
to wasteful overdimensioning of the network. Long run statistical measures, such 
as the mean rate, standard deviation, etc., allow better allocation of network 
resources but cannot be effectively policed. We have developed a new threshold- 
based characterization which is almost complete general and has immense 
flexibility together with the ability to be very specific. It considerably generalizes 
the moving window and peak rate allocation schemes. A policing mechanism 
based on this characterization has a reasonably simple implementation. Advanced 
services can be built on this basic policing mechanism. There are several 
parameters which can be tailored to each user’s application. Easy-to-implement 
procedures are developed to obtain the remaining parameters directly from user 
data. The characterization can be used in a "cold start" mode when there is 
uncertainty about user behavior, and easily adapted as more data becomes 
available. 

One aspect that clearly stands out is the generality of TCS. TCS has a large 
number of parameters but many of these are simply selections that the network 
and/or user must make. The parameters can be chosen to allow a wide variety of 
traffic patterns with the shape of allowable patterns specified much more precisely 
than the Leaky Bucket. The parameters can also allow any amount of excess traffic 
as desired. Thus, TCS promises to be a widely used scheme due to its unique 
combination of almost complete generality and immense flexibility together with 
the ability to be very specific. 

We have developed TCS in its most general form. Simple ready-made versions of 
TCS with just a few thresholds will suffice for single users. More parameters will 
be used and more options allowed for services linking networks or involving a 
large amount of traffic. The "tailoring" of TCS is left for future work. 
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TCS can be generalized by adding parameters which control the distance between 
successive threshold violations (see Section 4.2). The full development is left for 
future work. 

Finally, the performance of TCS should be studied and compared with that of the 
Leaky Bucket, the Generalized Leaky Bucket (Johri (1997)), and other schemes, 
for a variety of settings. This study will be similar to the one in Rathgeb (1993). 
This is also left for future work. 

7 REFERENCES 

Butto, M., Cavallero, E. & Tonietti, A., (1991), "Effectiveness of the Leaky Bucket 
policing Mechanism in ATM Networks," IEEE Journal on Sel Areas in Comm. 
9, 335-342. 

Choudhury, G. L., Lucantoni, D. M. and Whitt, W., (1996), "Squeezing the Most 
Out of ATM," IEEE T. on Comm. 44, 203-217. 

Doshi, B. T., (1994), "Deterministic Rule Based Traffic Descriptors for Broadband 
ISDN: Worst Case Behavior and Connection Acceptance Control," in The 
Fundamental Role of Teletraffic in the Evolution of Telecommunication 
Networks (Proceedings of ITC 14), edited by J. Labetoulle and J. W. Roberts, 
VoL la, 591-600, Elsevier, New York. 

Elwalid, A. & Mitra, D., (1993), "Effective Bandwidth of General Markovian 
Traffic Sources and Admission Control of High Speed Networks," lEEE/ACM 
Transactions on Networking 1, 329-343. 

Johri, P. K., (1995), "Estimating Cell Loss Rates in High Speed Networks with 
Leaky Bucket Controlled Sources," Int. Journal of Comm. Systems 8, 303-312. 

Johri, P. K., (1997), "Apparatus and Method for a Generalized Leaky Bucket," 
United States Patent 5,625,622. 

Rathgeb, E. P., (1992), "Policing of Realistic VBR Video Traffic - A Case Study," 
IFIP Trans.: C Communications Systems 4, 287-300. 

Rathgeb, E. P., (1993), "Policing of Realistic VBR Video Traffic in an ATM 
Network," Int. J. of Digital and Analog Communication Systems 6, 213-226. 

Turner, J. S., (1986), "New Directions in Communications (or which Way in the 
Information Age?)," Proc. Zurich Sem. Digit. Comm., Zurich, Switzerland, 
March 1986, 25-32. 

Tutofor, K., (1990), "On Admission Control and Policing in an ATM-based 
Network," Proc. 7th ITC Seminar, New Jersey, October 1990, paper 5.4. 



92 




How does an ATM switch see the 
traffic through the Leaky Bucket? 



I. Cselenyi, N. Bjorkman 

Broadband Team, Network Research Dept., Telia Research AB 
Rudsjbterrassen 2, S-13680 Haninge, Sweden, 

Phone: +46-8-707 5028, Fax: +46-8-707 5596, 

E-mail: istvan. i. cselenyi@telia. se 

S. Molndr 

HSN Laboratory, DTT, Technical University of Budapest 
Stoczek utca 2, H-1111 Budapest, Hungary, 

Phone: +36-1-463 3889, Fax: +36-1-463 3107, 

E-mail: molnar@ttt-atm. ttt. bme. hu 



Abstract 

This paper describes a new way of ATM traffic characterisation based on 
measurement. The method, called Leaky Bucket Analysis can be directly used for 
quantitative analysis of burst structure, determination of adequate shaping rate, 
selection of sustainable parameter set, retrieving input parameters for source 
modelling, analysis of multiplexing performance and dimensioning ATM 
networks. The proposed method is based on the leaky bucket algorithm which is 
widely used in ATM context. The goal of our characterisation method is to reveal 
the important properties of the traffic just as the switch sees them. 

The usage and applicability of the Leaky Bucket Analysis method is 
demonstrated both on artificial traffic patterns and on 30 real traffic traces captured 
from different applications such as Internet, video conference, and LAN-LAN 
interconnection. Detailed conclusions are drawn from the results of Leaky Bucket 
Analysis establishing the relation between the features of the source, settings in the 
application and the gained characteristics. The robustness of the Leaky Bucket 




Analysis method is also examined; a safety margin is defined for each analysed 
traffic type, the impact of finite measurement trace is highlighted and the extension 
for the case of non-zero cell loss is illustrated. A number of application areas are 
illustrated at the end of the paper. 
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1 INTRODUCTION 

Source characterisation is essential for prevention of congestion in ATM networks 
both on call level (Call Admission Control) and cell level (Usage Parameter 
Control or shaping). A new call can be admitted into the ATM network only if the 
involved ATM nodes can provide the network resources it intends to take, without 
affecting the QoS of previously accepted calls (Roberts, 1996). According to the 
standard (ITU-T 1.371) the ATM terminal has to describe its traffic by giving the 
so called sustainable parameter set (i.e. Sustainable Cell Rate, Maximum Burst 
Size, Peak Cell Rate and Cell Delay Variation Tolerance) to the access node. 
However, there are several problems arising from this concept. Firstly, this 
parameter set is capable only for a very rough description of the traffic and gives 
basically only the characteristics of a simple on-off source, which consumes much 
more network resources in terms of bandwidth and buffer space than the original 
traffic. The result is over-booking. Secondly, it is a difficult task for the terminal 
(or for the user in front of it) to give a realistic hint even for these few parameters. 

The only way to gain more, and more reliable information about the source’s 
behaviour in the buffer is to measure it. Our experiences verified that the result of 
measurement based characterisation can be used as a’priori information, since a 
finite threshold can be given for the deviation of repeated characterisations of the 
same source. Naturally, the threshold depends on the specific traffic type, e.g. 
Variable Bit Rate video, Internet or aggregate network traffic (see Section 5.1). 

Cell level congestion control takes place after a call is admitted. The network 
protects itself first through policing (UPC), which enforces the contract specifying 
the sustainable parameter set. In addition, some of the less delay sensitive traffic 
classes can be shaped. The most common method both for policing and shaping is 
to apply one or two Leaky Buckets (Elwalid, 1995). However, this is often 
inadequate, since one or two simple Leaky Buckets (LB) can not describe and 
control the overall burst structure of the traffic; they are coupled with a certain 
time-scale. Therefore a more complex analysis of incoming traffic should be 
performed on each time-scale. 

A number of different methods have been proposed (Frost, 1994, Stamoulis, 
1994, Roberts, 1996) for traffic characterisation. However, based on the 
aforementioned considerations a measurement based traffic analysis method is 
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necessary which conforms with the sustainable parameter set, but easy-to fulfil and 
provides more accurate description of required network resources on each time 
scale. The access node judges and sees the incoming traffic through the LB 
algorithm. Therefore, it is a logical idea to characterise the traffic source by 
calculating its resource consumption by the LB algorithm for all leak rates from 
zero to the full link rate. Procedures for selecting shaping rate and building up 
traffic models using the Leaky Bucket Analysis as traffic characterisation tool 
were presented in previous works (Latour-Henner, 1997, Cselenyi, 1997, Molnar, 
1996a). It is shown in this paper that the Leaky Bucket Analysis provides besides 
the standardised traffic description parameters additional information which can be 
applied for burst structure analysis, determining the sustainable parameter set, 
analysis of multiplexing performance and network dimensioning. 

The proposed Leaky Bucket Analysis method is described in Section 2 and 
applied on artificial traffic patterns in order to get a formal description in Section 
3. As real validation of the method, LB A is also performed on different ATM cell 
streams captured by actual measurements. Several types of single and multiplexed 
traffic sources from different applications are characterised and the results are 
evaluated in Section 4. A robustness study of the Leaky Bucket Analysis method is 
presented giving safety margin for each analysed traffic types and showing the 
impact of finite measurement traces and finite buffer size, in Section 5. In 
Section 6, a number of examples are shown, how the proposed characterisation can 
be applied for different goals. Conclusions are given in Section 7. 



2 LEAKY BUCKET ANALYSIS 

2.1 The Leaky Bucket Curve 

The points of the leaky bucket (LB) curve give the maximum queue length as a 
function of service rate (i.e. leak rate) in case of a G/D/1 queue fed by the source to 
be characterised (Figure 1). The two parameters of the leaky bucket are 
represented by the two axis. The points of the LB curve give the proper leak rate 
(/*) and queue length {q) pairs. In other words, this graph shows how much 
bandwidth and buffer space are consumed by the source in different working 
points. 

In case of a finite trace of bursty traffic, this is a monotonous decreasing curve, 
which starts from the total number of cells in the trace {nf) and approaches zero at 
the peak cell rate {rp) of the source (which is equal to the full link rate, ri in case 
of unshaped traffic). Because of the bursty nature and periodicity of the traffic, the 
curve has several linear segments. As an approximation, the LB curve can be 
replaced by a set of linear sections which can be determined by their slope {si). 
There are neighbouring sections, which slopes differ with one or more orders of 
magnitude, and others, which have almost equal slopes. The number of section 
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groups (called regions) containing sections with similar slope is equal to the 
number of burst levels (K) in the traffic pattern. The last breaking point (rjc - gx), 
related to the highest burst level is remarkable, since the slope following this point 
(sk) is determined by the total number of cells in the trace, but this breaking point 
is independent of it. 

In spite of the LB curve, the parameter triple of the traffic contract (rp, rs, ns) 
gives only a line which assumes much higher resource requirement (see bold line 
in Figure 1). 




rMrK n rs r2 rj=rp rp 



Figure 1 Leaky Bucket Curve of a finite, bursty traffic trace and operating line 
determined by the sustainable parameter set. 

Ranging backwards from the peak rate to the mean rate, each section of the 
approximated curve can be described by the following equation: 

?,('■) = 9/ if < r < r, and 5, = ^ . (1) 

The same equation using the leak time period {t = \/r): 



if tj <t < and = 






These equations can be used for finding the characteristic traffic parameters of the 
captured cell stream and calculate the slope of LB curve as a function of leak time 
period in order to analyse the burst structure. It is important to notice that cell rates 
are normalised with respect to the full link rate rp. Therefore = 1 and 0 < r < 1 . 
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2.2 The proposed analysis method 

The main steps of the Leaky Bucket Analysis are the following: 

1 . A sample trace should be captured from the investigated ATM source. 

2. The points of the LB curve should be calculated by post-processing the 
captured cell stream and the result should be approximated by a set of linear 
sections. 

3. The slope of each section should be calculated as a function of the leak time 
period using (2). That equation can be further simplified by a practical 
selection of section margins of leak time (//): 

t,=ie[\,2X.K] ^ 

s(?) = /(/ + l)(9,^i-qr,) = r(? + l)Agr, => Ag-, 

which equation expresses how sensitive is the buffer requirement of the source 
to the changes in the allocated rate. 

4. The absolute value of the s(t) function should be plotted on a logarithmic 
scale, in order to highlight the characteristic sections of the LB curve. The 
value of tK can be easily read in the LB slope graph by finding the last jump of 
the curve. Thus vk can be calculated and the part of the LB curve depending 
on the number of captured cells can be determined. 

5. Adequate operating regions can be selected for resource allocation and call 
admission control, or leak rates can be found for policing or shaping based on 
the LB and LB slope curves (see Section 6). 

6. Other characteristic t and related Aqt values can be determined by finding 
further Jumps in the s(t) curve and applying (3). The related rr-qt pairs can be 
read from the LB graph and the burst size can be calculated using qt (Cselenyi, 
1997). 

7. A quantitative description of burst structure can be retrieved from the LB 
slope curve by comparison to characteristics of multilevel on-off sources. Also 
source models can be established hdi^Qd on the scanned burst parameters. 

8. Since the LB curve gives the maximum queue length, i.e. the worst case 
behaviour of the analysed traffic, the maximum burst sizes can be read in the 
LB slope graph. The mean burst size can be determined by plotting the mean 
queue length and its slope, in a similar way. 

9. Apart from the exploration of burst structure of the traffic, the buffer overflow 
probability of single sources can be estimated using the three dimensional LB 
Analysis method (Cselenyi, 1996a). 
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3 CHARACTERISATION OF ARTIFICIAL SOURCES 

The proposed Leaky Bucket Analysis method is applied first to synthetic traffic 
patterns in order to get an analytical description. A simple, periodic on-off pattern 
is investigated first, since it was found in previous work (Cselenyi, 1997) that 
deterministic processes producing an on-off pattern play an important role in 
modelling VBR video sources. Two- and multilevel on-off patterns are 
investigated by LBA as an extension and generalisation of the simple on-off case. 

3.1 Simple on-off pattern 

An on-off pattern consists of a burst at peak cell rate (r/>=l/T/>) with size vim and 
duration Tb, followed by a silence period of duration Ts after which a new burst 
can be sent. The interarrival time of bursts is denoted by Tm, the total number of 
cells by nj and the duration of traffic trace by Tj in Figure 2. 




Figure 2 The traffic pattern of an on-off source. 

There are three regions of the leak rate (r) where different expressions hold for the 
maximum queue length: 



g-oW = 0 (4a) 

f 

(l\ir) = r^Tg-rTg=n^ 1 - 
V 

(^) = 1 - — 1 + ( - TM){rM -r) = nj-r{ if 0<r<r^. (4c) 

V rpj V fw 




These equations are visualised in Figure 3 by the LB and LB slope curves. 
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Figure 3 Leaky Bucket Curve and LB slope curve of a periodic on-off source. 

Noticeable are the three plateau in the LB slope curve which correspond to the 
three regions of the LB curve (r/> - ri, tm - rp and 0 - The breaking points of 
the LB curve represent the peak and mean rate, while the jumps in the LB slope 
graph highlight the burst interarrival times. The region with a slope equal to zero 
represents the lowest “burst” level (i.e. cell level), where the leak time period is 
less than 7>. This segment of the LB curve does not exist in case of unshaped 
sources. One can read the burst size {hm) on the next burst level by extending the 
first section {si) of the LB curve beyond toward the q axis. The duration of on 
period (i.e. the first level burst, Tp) is indicated by the first plateau {sj) in the LB 
slope graph. The junction point of first and second sections represent the mean cell 
rate, maximum burst size parameter pair. The slope of last section ( 57 ) is 
determined mainly by the duration of the traffic trace. 



3.2 Two-level on-off pattern 



As an extension of the previous case, one can consider a state machine with three 
states. The source generates cells at peak cell rate of duration Tp2 in the first 
state, than waits Ts2 cell time until generating the next burst. It repeats this 
procedure ns times and waits Tss cell time before starting again. The resulting 
traffic pattern has consequently four burst levels i.e. the LB curve has four sections 
(see Figure 4). The buffer requirement has the following expression in the four 
leak rate regions: 



?o(0 = 0 

9 |( 0=«2 



\ rpj 



if 


A 

lA 


(5a) 


if 


^2 < r < rj . 


(5b) 
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(5c) 



^ 2 ('') = « 3 « 2 -H 



qy(r) = nj-r 



riT-n^n^ (»3~lH , «2 






if r,<r<r^. 



if 0 < r < ^3 . 



(5d) 



Similarly to the previous case, the LB slope curve expresses the duration of bursts 
on each burst level. The burst interarrival times can be also easily read in that 
graph. 




> ^ Slope of LB 






Leak Time 
> 






-H 





1 Tp T2/ri2 Ts/ri2n3 



Figure 4 Leaky Bucket characteristics of a two-level on off pattern. 



3.3 Multilevel on-off pattern 

Further extending the number of burst levels, yields in a multilevel on-off pattern 
(Figure 5), of which LB curve consists of linear sections just as the approximated 
LB curve of bursty traffic drawn in Figure 1. The number of plateau in the LB 
slope curve corresponds to the number of burst levels (/Q. The first burst level is 
the cell-scale where one cell makes the “burst” (i.e. «y=l) followed by a silent 
period of T$i (which is zero in case of unshaped sources). An interesting way of 
modelling bursty sources is to consider the linear sections of its LB curve as an 
equivalent multilevel on-off source, calculate the burst parameters according to 
equation (5) and extend this model with information about the time domain 
behaviour of the source (e.g. sequence order of bursts) (Cselenyi, 1997). 
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Figure 5 Multilevel on-off traffic pattern. 



The traffic pattern and the main parameters of a multilevel on-off source are 
presented in Figure 5, while the mapping between the parameters of the presented 
on-off patterns and this general case are given in Table 1. 

Table 1 Mapping the parameters of on-off traffic patterns 



K 


Tm 


Tp 


Tbi 


ni 


2 


T, 


T, 


1 


1 


3 


T, 


T, 


1 


1 


K 


Tk 


T, 


1 


1 



4 CFIARACTERISATION OF REAL ATM TRAFFIC 

Several hundreds of traces from many different ATM traffic types were captured 
by measurements in Telia Research, Sweden during the last couple of years 
(Bjorkman, 1995, Molnar, 1996b, Cselenyi, 1997). These cell streams were 
analysed by the proposed method. In order to verify the applicability, exemplary 
results are presented and evaluated giving an explanation of the results. 
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4.1 Multiplexed VBR traffic sources 

Traffic of multimedia workstations connected via the Stockholm Gigabit Network 
(an ATM MAN) was multiplexed with CBR background traffic (Molnar, 1996b). 
Long traces of both traffic types were captured before and after multiplexing. LB 
curves of a shaped single multimedia source (a) and the aggregate traffic of four 
workstations (b) are depicted in Figure 7. The parameters and traditional traffic 
characteristics - such as mean cell rate and burstiness (i.e. squared coefficient of 
cell interarrival time variation) - of the traces are presented in Table 2. 




Leak Rate [cps] 




0 20 40 60 80 

Leak Time [cell time] 



Figure 6 Multiplexing VBR video sources: (a) single VBR source before 
multiplexing (b) aggregate of four sources after multiplexing. 

The four source were shaped to 34 Mbps each, in order to avoid congestion in the 
first switch. Naturally, this peak rate limitation does not hold for the aggregate 
traffic (b) as it is pronounced in the LB slope graph by missing 0^^ level plateau. 
The applied shaping rate can be identified by the place of first jump in the LB 
slope curve of trace (a). There were several ATM switches and SDH add/drop 
multiplexers on the way of the traffic from the source to the measuring point. 
These active ATM and SDH devices destroyed the burst levels of the original VBR 
traffic (e.g. sequences c,d,e) by splitting and merging the bursts on different levels. 



4.2 Multiplexed CBR and VBR traffic 

VBR traffic produced by multimedia workstations and CBR traffic as a 
background load were fed into a FIFO multiplexer with no prioritising. Several 
traces were captured from the VBR traffic of the same input video sequence after 
multiplexing. In spite of the increasing background CBR load the characteristics of 
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traces (b-d) coincide very well in both graphs. Four plateau i.e. burst levels can be 
identified in the LB slope curve of each sequence and only one of them differs 
significantly, actually the third region in the leak time region of 18 - 27 cell time. 
An other noticeable difference is that as more the background CBR load increases 
as lower the slope curve starts. A possible explanation of this observation is that 
the heavy CBR stream spaced out the small, bursty VBR traffic. Since no cell loss 
occurred, the mean cell rate (r^) remained unchanged, but the other burst levels 
and 3'"*) were changed. 





Figure 7 VBR video source multiplexed with CBR traffic in case of (c) @100 
Mbps, (d) @120 Mbps and (e) @140 Mbps background load. 



4.3 VBR video traffic 

The effect of changing input video sequence were analysed by measuring the 
traffic of a TCP/IP over ATM based video conference application transferring 
several standardised CCIR video sequences which had equal duration but different 
burst structure corresponding to the picture content (Figure 8). The mean cell rate 
and the rx - qx point of sequence (f), which is an almost “still picture video”, are 
far from the other sequences. However, the first two plateau of the LB slope curves 
coincide in case of each video sequence. A possible explanation is that the VBR 
video source has a very deterministic nature. The video application produces video 
frames on a given frame rate. These frames are sliced first into Maximum Transfer 
Units (MTU), than packed into IP packets and segmented into ATM cells by the 
protocol stack. This results in a multilevel burst structure, where only the size and 
timing of top-level burst (i.e. video frame) is different. The number of captured 
cells (see Table 2) determines the sx values, as expected. 
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Figure 8 Leaky Bucket Curves of several video sequences: (f) Girl with Toys, 

(g) Sussie, (h) Table Tennis, (i) Tempest, (j) Flower Garden, (k) Popples. 

The effect of changing the frame rate as performance parameter of the video 
application was also investigated. It can be seen from the result in Figure 9 that the 
LB curves are shifted along the x-axis but the slope values are equal. The slope 
curves show that the burst structure differs only on the highest level, i.e. video 
frame level, while the MTU and IP levels are equivalent. 




ou ■■■■_ izr. . .._l - 
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Figure 9 Effect of changing frame rate (1) @10 fps, (m) @20 fps, (n) @25 fps. 

One group of media parameters has no impact on the network resources (e.g. 
brightness) while the other parameter group changes the amount of data to be 
transferred. The frame rate parameter belongs to the second group, as it can be 
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seen in Figure 9. Although the amount of data is linearly proportional to the frame 
rate, the (m) and (n) curves are much closer to each other than to curve (1). A 
possible explanation to this phenomenon is that there is a finite upper limit of the 
performance of multimedia workstation, and it can not really support real-time 
pictures (25 fps). The LB slope curve of trace (n) support the hypothesis regarding 
the saturation of workstation performance. 



Table 2 Traditional characteristics of VBR traffic sources 



Trace 


Trace 
Length (s) 


Total Number 
of Cells 


Mean Cell 
Rate (cps) 


Burstiness 


Comments 


a 


294.42 


1 627 720 


5500 


140.3063 


shaped 


b 


68.92 


1 000 000 


14 500 


133.3298 


aggregate 


c 


19.54 


253 627 


12 900 


700.6972 


bgr. 100 Mbps 


d 


20.38 


264 258 


12 900 


693.0733 


bgr. 120 Mbps 


e 


20.14 


260 080 


12 900 


680.9803 


bgr. 140 Mbps 


f 


51.00 


100 309 


1960 


179.8622 


10 fps 


g 


45.60 


138 453 


3030 


258.6941 


10 fps 


h 


36.86 


151 544 


4110 


345.9633 


10 fps 


i 


45.68 


307 471 


6720 


483.3476 


10 fps 


j 


38.27 


250 142 


6540 


486.7423 


10 fps 


k 


49.67 


293 180 


5890 


452.2486 


10 fps 


1 


35.73 


87 353 


2440 


197.9074 


10 fps 


m 


32.41 


117 899 


3640 


245.1639 


20 fps 


n 


45.29 


186 966 


4120 


226.8030 


25 fps 
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4.4 Internet traffic 



Internet traffic is in the spotlight of many recent research activities. The traffic of 
fundamental TCP/IP based applications such as WWW Browsing, FTP and Ping 
and were analysed. The traditional traffic characteristics are given in Table 3. 

During capturing traces (o,p) a large size image was downloaded several times 
with a WWW browser. The rK - qK and r2- q2 points of curve (o) are very close 
to each other, which means that it has a very simple burst structure determined by 
only a few burst levels (Figure 10). 




0 2 4 6 0 50 100 150 200 

Leak Rate [cps] .j q 4 Leak Time [cell time] 



Figure 10 LB Curves of WWW browsing: (o) real and (p) emulated user. 
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Figure 1 1 Leaky Bucket Curves of file transfer sessions. 
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A large size file was downloaded during the FTP session (Figure 11). The LB 
slope curves of these sources (q,r) indicates the very simple bust structure. The 
similar characteristic of FTP traffic for repeated measurements - what we 
experienced for much more trials (Molnar, 1996b) - is also remarkable. 



Table 3 Traditional characteristics of Internet traffic sources 



Trace 


Trace 
Length (s) 


Total Number 
of Cells 


Mean Cell 
Rate (cps) 


Burstiness 


Comments 


0 


11.45 


22 889 


1900 


142.7399 


real user 


P 


12.20 


22 888 


1880 


150.4459 


SUE 


q 


1.51 


81 117 


6843 


680.8446 




r 


1.48 


81 116 


6718 


759.9904 




s 


30.37 


230 732 


7573 


448.6227 


128 cells 


t 


29.94 


286 945 


9583 


444.6371 


256 cells 


u 


29.98 


233 723 


7772 


471.4246 


512 cells 


V 


29.83 


290 948 


9726 


458.3453 


1024 cells 



The ping UNIX command was used with four different sizes of transferred 
message (128, 256, 512 and 1024 cells). The value of sj and q 2 are proportional to 
the message size in case of these four sources (s-v). It is noticeable in Figure 12 
that the LB slope curve of each Ping trace begins with a value which corresponds 
to the packet size setting of the trace, i.e. 164, 240, 475 and 1180 cell times for the 
(s-v) traces, respectively. Although these values does not exactly follow the 1 :2:4:8 
ratio, the error is due to the fact that LBA captures the worst case behaviour of the 
traffic (exactly as the switch). 
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Figure 12 Leaky Bucket Curves of traffic from ping sessions. Message size: 
(s) @128, (t) @256, (u) @512, (v) @1024 cells. 



4.5 Aggregate traffic 

Internet traffic of interconnected LANs was also measured on the Swedish 
University Network (SUNET). Since these traces contain more than 83 million 
cells, these curves have large qx and large slope values (Figure 13). Although the 
traces were taken at different time, it is interesting to recognise the similarity 
among the curves from independent measurements in both graphs. The mean cell 
rate varied between 3000 and 9000 cps. The burst structure is very disperse, 
similarly to the other aggregate trace of four video sources (b) in Figure 7. 

10000 

^ 8000 
SI 

6000 

O 4000 
E 

13 

E 

S 2000 
(0 



2 4 6 0 40 80 120 

Leak Rate [cps] ^ ^ q 4 Leak Time [cell time] 

Figure 13 Leaky Bucket Curves of aggregate traffic measured on SUNET. 
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5 ROBUSTNESS OF THE LEAKY BUCKET ANALYSIS 



An important feature of a practicable characterisation tool that it highlights the 
essential characteristics of the traffic and hides the unnecessary variations. 
Therefore an ideal tool gives exactly the same descriptors to each traces of a 
specific traffic, independently from the length of the trace and the time when it was 
captured. The impacts of these two factors on the result of the proposed Leaky 
Bucket Analysis method are presented in this section. 

5.1 Repeatability 

Many traces were captured from the same traffic type, using the same quality 
settings (e.g. video frame rate) in order to examine the repeatability of the method. 
The traffic types are different from the perspective of determinism of cell 
generation. The VBR video traffic is easily reproducible, since the input video 
sequence can be repeated and most of the processes transforming and transferring 
the data inside of the multimedia workstation are of deterministic nature (Cselenyi, 
1997). The only non-deterministic factor is the software application that gets the 
coded input video from the video card and sends the processed data unit to the 
protocol stack according to the scheduler of the operating system. That is why it 
can be assumed that traces taken from a VBR video traffic source have the same 
significant characteristics. This fact is noticeable in Figure 14 by showing the 
relative error ratio of LB curves calculated for traces of the same source from 
different trials: 






for Vy,* J ^ k. 



( 6 ) 



Safety margins can be defined by giving accuracy thresholds and the leak rate (or 
leak time) regions where they are valid. The safety margins are indicated by bold 
lines and the maximum, minimum and mean error curves are drawn in Figures 14- 
17. The relative error has a peak between 20 and 30. This means about 40 ms i.e. 
the video frame rate set in the video application. As described above, the real 
frame rate is determined by the scheduling system and the upper limit of terminal’s 
performance. This is a reasonable explanation for the degradation of repeatability 
on this track. 
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Figure 14 Repeatability of VBR video traffic. 

In spite of VBR video traffic, one can only assume that the sources have similar 
behaviour in a statistical manner in case of Internet traffic. Therefore to have a 
robust statistical average, hundreds of traces should be taken from the same traffic. 
Repeating the measurement is quite easy in case of simple “Internet sessions”, like 
Ping and FTP. According to Figure 15, the behaviour of these sources is very 
deterministic. The only jump of relative error can be seen around the rj(^ breaking 
point. Repeating more complex sessions, like WWW browsing is more 
problematic. Thus the examined WWW session was repeated using a software tool 
called Service User Emulator (Cselenyi, 1996b), which can record and play back 
the interactions of user and this way it generates repeatable traffic. The emulated 
user could repeat the WWW session quite well, as it can be seen in Figure 10. 
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Figure 15 Repeatability of ping sessions. 
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The safety region of acceptable error threshold (50%) is much shorter for 
aggregated Internet traffic, than for simple sources. The main reason is the much 
higher number of independent processes which are generating or influencing the 
traffic. Considering the other safety margin, it is interesting that any bandwidth can 
be allocated if the buffer requirement is doubled. 




Figure 16 Repeatability of aggregate Internet traffic. 



5.2 Effect of trace length 



One trace of measured video traffic, captured after multiplexing with CBR 
background traffic, were sliced into two and four parts in order to investigate the 
effect of total cell number on the shape of the LB curve. The LB slope curve of 
these traces are depicted on a linear scale in Figure 17. 











run lengin 


r 










\ 
















\ 




half lei 


igth 












\ 


\ 


















L 




larter ler 
\ 












qi 


Igth 








1 












^ 











0 10 20 30 40 50 60 

Leak Time [cell time] 



Figure 17 Effect of the number of cells in the measured trace. 
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The results confirm the theoretical assumptions presented in Section 2. The value 
of last slope (sk) of each traces are proportional to the number of cells in the trace 
(Table 2). However the LB curves are almost identical for the rK<r (or //:>/) part, 
supporting the proposed steps of our method. 

5.3 Cell loss probability estimation 

The Cell Loss Probability can be estimated by the overflow probability calculated 
by post-processing of measurement traces. In this way a set of LB curves can be 
retrieved corresponding to different loss probabilities (see Figure 18). Instead of 
the original zero loss curve, the adequate LB curve can be applied for the analysis 
and resource allocation task. More details can be found in (Cselenyi, 1996a). 




Figure 1 8 LB type curves of trace (b) parameterised by estimated CLR. 



6 APPLICATION OF THE LEAKY BUCKET ANALYSIS 

The aim of this section is to give examples how the proposed leaky bucket 
algorithm can be applied in practice. Based on the results presented in Section 4, 
different characterisation goals are achieved. Other applications, such as 
determining shaping rate, retrieving parameters for VBR source model and 
network dimensioning, were described in previous works (Bjorkman, 1995, 
Molnar, 1996a, Latour-Henner, 1997, Cselenyi, 1997). 
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6.1 Burst structure analysis 

One possible application of the LBA is to detect the burst structure of the analysed 
traffic source. The Leaky Bucket curve and the LB slope graph of real traffic 
sources are very similar to that of the two-level on-off traffic drawn in Figure 4, 
hence it seems to be logical to apply the equations (5a-d) for determining the 
parameters of burst structure (i.e. duration and interarrival time of bursts at each 
level). The parameters read in the graph are written in the left column while the 
calculated parameters in the right: 

T ^2 « 300 => 300 (6a) 

The value of «2 can be more precisely read in the LB graph and calculated using 
(5b). With that value: 



^ = 10,«2*180 => 
"2 


= 1800 


(6b) 


7’fl2+(«3 - 1)^2 *15800 => 


«9.7 


(6c) 


T 

20< — ^<21,«2j^» ^ 


35000 <T^< 37000 


(6d) 






Therefore there are active periods of duration 300 cell time followed by silent 
period of 2700 cell time on the first burst level, according to observations (6a) and 
(6b). The size of this burst («2) is determined by the Maximum Transfer Unit of 
the video application (app. 8200 bytes). On the next level, about nine of first level 
bursts make a longer burst of duration 15800 (6c). This is the largest video frame 
in the trace. The range of interarrival time of these frames given by (6d) 
corresponds to the frame rate setting in the application, i.e. 10 fps. These 
observations support results of previous work (Moln^, 1996b). 

6.2 Giving hints for the sustainable parameter set 

Assuming that the sustainable parameter set should be determined for a VBR video 
source like (c-e), first a region is to be selected which is indifferent for changes in 
the environment (i.e. background load in this case). In the LB slope graph of 
Figure 6, the region related to leak times between 1 1 and 18 cell time seems to be 
adequate. After checking the safety range in Figure 14, this interval can be further 
reduced to ts= 12, i.e. the corresponding sustainable cell rate is ts = 30566 cps. The 
maximum burst size gs and the peak cell rate rp can be read in the LB graph. 
However, the former should be increased according to the safety threshold (25% 
for this ts) for a robust decision. Therefore rp = 366792 cps and qs = 439 cell. 
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6.3 Analysing multiplexing performance 



LBA is adequate for analysing the effect and performance of active network 
elements, such as mutliplexer and shaper. Figure 6 presents the LB characteristics 
of a video stream before multiplexing (a) and of the aggregate traffic of four 
multiplexed video streams (b). The third curve illustrates the theoretical maximum 
resource requirement of the four sources, when they would consume the most 
bandwidth and buffer space. It is visible, that statistical multiplexing performed 
quite well, since the LB curve of real aggregate traffic (b) is much lower than for 
the estimated worst case. Naturally these observations can be quantified based on 
the LB graphs as well. 



7 CONCLUSIONS 

In this paper, we introduced a simple and practical traffic characterisation method 
called Leaky Bucket Analysis. The procedure of LBA starts with capturing a trace 
from the examined traffic and calculating the maximum buffer requirement as a 
function of LB leak rate. Approximation of LB curve by a set of linear sections 
and plotting the slope of these sections as a function of leak time are proposed for 
highlighting the burst structure of the source and allow quantitative analysis. The 
main advantage of this method is that it gives the resource requirement and 
describes the burst structure of the source on each time-scale in one graph, while 
the standardised source descriptors and usual burstiness measures are time scale 
dependent. Moreover, the Leaky Bucket Analysis method provides a quick visual 
impression about the traffic structure. 

The method has been described and demonstrated on 30 traces taken from actual 
traffic measurements of VBR video, Internet, ATM LAN and MAN traffic. Both 
single sources and traces from aggregate traffic (e.g. after multiplexing) has been 
analysed. The results show that besides the basic traffic characterisation parameters 
additional information can be gained about the burst structure and periodicity of 
ATM traffic. The robustness of LBA is analysed and safety margin, error threshold 
pairs are defined for presented traffic types. Several application area of LBA, such 
as burst analysis, selection of sustainable parameter set, analysis of multiplexing 
performance and dimensioning ATM networks are also presented in the paper. 

Because of its simplicity, it would be possible to implement the LBA method as 
an integral part of ATM network elements. The practical applicability of the 
method for network dimensioning, and CAC design are in the focus of our recent 
and future research activities. 
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Abstract 

Virtual paths (VPs) facilitate rapid movement of end-to-end traffic streams 
in ATM networks by keeping processing at intermediate nodes at a minimum. 
For low volumes of traffic, however, VPs may suffer from poor statistical mul- 
tiplexing gain, and thus result in low utilisation of transmission resources. 
Balancing between processing and utilisation, we study a procedure whereby 
lightly loaded VPs are decomposed into at most two segments, which require 
only one intermediate switching for an end-to-end traffic stream. The result- 
ing segments are then aggregated with other segments or existing VPs. The 
resulting VPs can thus achieve higher multiplexing efficiency for marginally 
increased processing. For our test networks, experimental results show that 
about 95% of the lightly loaded VPs will be decomposed and aggregated by 
our procedure. When applied to networks where the VPs are adjusted over 
time to meet traffic variations, the procedure results in an increase in carried 
traffic and reduced overall overhead by providing a more stable VP network 
configuration with fewer modifications. 



Keywords 

Virtual path network. Virtual path routing. Virtual channel routing. 



1 PRELIMINARIES 



1.1 Definition of Virtual Paths 

The virtual path (VP) and virtual path connection (VPC) are important 
concepts in ATM networks. A VP recognises the distinct identity of a traffic 
stream between two nodes. Cells belonging to a particular VP are identified 
by a common, fix VP identifier (VPI). A VPC consists of a series of concate- 
nated VPs (possibly with different VPIs) and specifies a route to be traversed 
by a traffic stream from an originating node through a number of intermediate 




nodes to a terminating node. Using VPCs allows for faster set-up of new con- 
nections (along predefined routes) and rapid movement of traffic (ATM cells) 
with minimal processing at intermediate nodes (which is related to switching 
costs at each node). 

A VPC may also be assigned a certain bandwidth, ^.e. a certain part of 
the bandwidth on each of the transmission links of the VPC may be reserved 
for its exclusive use. This accelerates the set-up of new connections (since 
exclusive ownership of bandwidth means that the availability of resources can 
be determined over the entire VPC). 

A further simplification and speed up is obtained if service classes are seg- 
regated into logically distinct VPCs such that each service class has its own 
logically independent VPC network. With homogeneous traffic, the bandwidth 
reserved on a VPC may be expressed in terms of the maximum number of si- 
multaneous connections it can support before violating some quality-of-service 
(QoS) constraints {e.g. cell loss). We call this number the “equivalent number 
of circuits” (or just “circuits” for short) for the traffic type considered, and 
express VPC band widths in terms of these rather than bits per time unit. 

We briefly mention that the calculation of the number of circuits from a 
given bandwidth is a complicated process which is done independently of the 
work presented here. It involves traffic characteristics {e.g. burstiness on the 
cell and burst scales and QoS constraints), and system properties {e.g. buffer 
capacities and strategies, and service policies). 



1.2 Management of Virtual Paths 

Because of the number of routes that traffic between a pair of end nodes 
may traverse, and the number of all possible pairs of end nodes with speci- 
fied traffic requirements, a frequent use of VPCs may give rise to inordinately 
large numbers of them. This may cause problems in two areas of ATM network 
management. One problem area is in the management of VPIs. The maximum 
number of VPIs is limited by the field length allowed for VPIs in ATM cell 
headers. The other problem area, which occurs only if VPCs are associated 
with reserved band widths, is in bandwidth management or efficient utilisa- 
tion of the transmission capacity on transmission links. This is particularly 
evident for VPCs the traffic of which vary over time (thus not permitting full 
utilisation during off-peak periods), and VPCs with small volumes of traffic 
(which do not allow full exploitation of statistical multiplexing). 

In a series of works, e.g. (Arvidsson 1994, Arvidsson 1995), we have studied 
automatic reconfiguration of VPCs as a remedy for efficiency problems associ- 
ated with time varying traffic. The basic concept is to reorganise the network 
of VPCs with respect to routes and band widths over time, in accordance 
with current demands. We have tried both on-line methods (based on short 
time traffic demand sampling followed by VPC network redesign) and off-line 
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methods (where VPC networks are designed according to averages of several 
demand measurements over the same traffic period). In this paper we extend 
our work and for the first time address the efficiency problems associated with 
low traffic volumes on some VPCs. 



2 STATEMENT OF PROBLEM 



2.1 Virtual Path Efficiency 

In network planning, design and management, there often exist competing 
goals and objectives, for which network planners have to strike a balance 
and arrive at a happy mean. One such case is ATM VPC routing, where 
the wish for simplicity in establishing, maintaining, and clearing connections 
contradicts the wish for high utilisation of the transmission network. 

A simple VPC identifies a route for a traffic stream between a pair of end 
nodes. Simplicity is achieved if traffic streams between pairs of end nodes are 
carried by end-to-end paths. This is because intermediate nodes mean individ- 
ual routing of each request in addition to intermediate switching of the traffic 
flows, both of which incur processing time and require processing logic that 
translate into additional equipment costs. Moreover, intermediate switching 
incurs increased cell delay and loss through the buffers involved that again 
translates to costs in terms of additional equipment requirements. A more 
advanced VPC may have a certain transmission capacity committed along 
its route. This simplifies the establishment and clearing of a connection since 
these can be done end-to-end. The result is less pressure on the processing 
logic that translates into reduced equipment costs. Finally, a VPC may be de- 
voted to a particular service class in order to simplify call acceptance control 
(CAC), by means of the equivalent circuit concept, and QoS management, 
through the distinction of different requirements provided by the VPIs. 

Separate VPCs for each service class with dedicated capacity means that 
VPCs and their committed circuits are treated as a distinct inviolable entities, 
and we do not allow any statistical sharing of circuits between VPCs. Net- 
work management, QoS management and connection management of VPCs 
and VCCs are indeed simplified this way. On the other hand, resources on 
transmission links are best utilised when there is maximum sharing among 
all VPCs traversing common transmission links. We are thus faced with the 
problem of choosing between reduced management efforts or high link utili- 
sation. 

To illustrate the point, consider Figure 1. The diagram to the left shows the 
number of intermediate switching points per flow vs. the size of VPC in terms 
of number of hops. Though the numbers are not very interesting themselves, 
it is immediately clear that longer VPCs mean less intermediate switching. 
(The figures given refer to our specific test networks, which are described 



121 




VC overhead 




Figure 1 Additional switching points vs. VPC length (right). Utilisation 
efficiency vs. traffic volume on the call scale (middle) and burst scale (right). 



in more detail below.) The middle and right diagrams show utilisation on 
the call and burst scale respectively vs. multiplexed number of channels and 
number of sources respectively. Again the numbers are not very interesting 
themselves, but we notice that the larger the traffic, the higher the utilisation 
for a constant QoS. It is also seen that there appears to be a limit above 
which further volumes do not significantly improve the utilisation. (The mid- 
dle curve is computed from the Erlang-B model with a call loss probability 
QoS requirement of 10“^. The right curve from the (not so realistic) model 
of (Anick et al. 1982) where independent, statistically identical sources alter 
between independent, exponentially distributed on- and off-periods and trans- 
mit at peak rate during the former where the peak to mean ratio is 10, the 
buffer size is 10 average bursts, and the cell loss QoS requirement 10“^.) 

Comparing the left diagram, which points at decreasing costs for longer 
dedicated VPCs with fewer intermediate switching points, to the middle and 
right diagram, which indicate reduced costs for shorter, general VPCs which 
can achieve high utilisation, it is clear that we are faced with a problem of 
striking a balance between the two factos. Indeed, as was suggested already 
by (Burgin 1989), the best choice is a compromise where the sum of switching 
costs and bandwidth costs reaches a minimum, i.e. we should consider separate 
VPCs but with some sharing. 

Separate, long VPCs make sense if there is adequately repeated updating 
(as in our automatic methods), and high volumes of traffic {i.e. high statistical 
multiplexing gain) on all VPCs. Sharing is, however, more attractive if rea- 
sonable link utilisation cannot be achieved by a single traffic flow and should 
therefore be used for VPCs with low volumes of traffic. So we leave end-to-end 
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Figure 2 VPC network design modification by decomposition and aggrega- 
tion. 



VPCs with high volumes of traffic (for which efficiency is not a big problem) 
but allow end-to-end VPC switch low traffic volumes to be decomposed into 
at most two VPC segments which can be aggregated with other, identical 
segments to allow for a higher degree of sharing. The principle of decompos- 
ing and aggregation is shown in Figure 2. The figure shows a network with 
three nodes (named 1, 2, and 3) and two links (from 1 to 3 and from 3 to 
2 respectively). The left picture left shows a network with three VPCs, the 
one in the middle shows how one VPC is decomposed into two segments, and 
right one how the two segments are aggregated with existing, identical VPCs. 



2.2 Formal Notation and Terminology 

We consider VPC networks of N nodes with known capacity matrices C (of 
size N X N) where Co^t, o^t : (o, t = 1,...,A^, o 7^ t) denotes the available 
bandwidth from node o to node t. All nodes contain VP cross connects and VC 
switching systems. This means that they may originate, terminate, and relay 
traffic either as bundles (VPs) or channels (VCs). To originate or terminate a 
channels requires both VP and VC functionality, while relaying can be done 
on the VP level only in the VP cross connect, or on the VC level by the 
VC switching system if preceded by VP demultiplexing and followed by VP 
multiplexing. 

For the sake of simplicity we omit conditioning on service classes and limit 
ourselves to the case of a single, uniform service class. (Note that we deploy 
service class (or traffic type) separation between VPC networks. Since we 
study decomposition and aggregation of VPCs within such VPC networks, 
neither the number of service classes (or traffic types) studied nor the specific 
choices will impact our results from a qualitative point of view.) 

All nodes originate traffic to and terminate traffic from all other nodes but 
themselves. User demands are fully characterised by a sequence of known end- 
to-end traffic demand matrices A{k) (of size N x N), where ao,t{k) denotes 
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the traffic demand from o to t at time k, k : {k = The time index 

k indicates intervals such as hour, day of week, or day of year. 

For each traffic matrix there is a corresponding VPC network design matrix 
D{k) (of size NxN), the elements do,t{k) of which contain physical routes and 
associated bandwidths (in terms of circuits) for the end-to-end traffic demand 
ao^t{k). D{k) is computed from A{k) taking the relevant QoS demands for the 
service class (es) and the constraints of the physical network (7, for details of 
the algorithm see (Arvidsson 1995). Our task is now to decompose the lightly 
loaded individual VPCs of all designs into two component VPCs of shorter 
length, which then must be aggregated to form shorter, common VPCs with 
higher loads. 

To simplify matters, we judge VPC traffic volumes by a simple threshold 
value (expressed in number of circuits). VPCs above this value are called 
“full VPCs” and are left untouched; VPCs below the value are called “thin 
VPCs” and are candidates for decomposition and aggregation. Typically, the 
threshold is set such that VPCs which can benefit significantly from increased 
volumes are aflFected. The goal of our decomposition procedure is to refine a 
design D{k) into a new, modified design D'{k) which is free from thin VPCs. 



3 RECURSIVE DECOMPOSITION AND AGGREGATION 

Our proposed algorithm examines the thin VPCs one by one, and for each of 
them it tries various decompositions until both segments can be aggregated 
into full VPCs. Segments can be aggregated with existing, full VPCs or with 
existing thin VPCs or other segments if the total bandwidth after the ag- 
gregations makes the thin VPC /segments qualify as full VPCs. To allow for 
the latter option must the outcome of a particular attempt depend on the 
outcome of the following attempts. The algorithm therefore takes a recursive 
approach to decomposition and aggregation, where a particular attempt is 
judged only when all offsprings of that attempt can be judged. 

We will now describe the details of algorithm. The basic components are 
four data lists, a main procedure, and a set of supporting functions. The lists 
are fullVPClist, which contains all full VPCs; thinVPClist, which contains 
all thin VPCs; candVPClist, which contains all segments and merged seg- 
ments that have not yet qualified as a full VPC; and failVPClist, which 
contains all thin VPCs which cannot be modified. To speed up the decom- 
position and aggregation, the lists in our implementation are sorted lexico- 
graphically, both forwards and backwards. 

The main procedure first initiates the lists mentioned above and then enters 
a loop where the thin VPCs are examined for decomposition and aggregation 
one by one. For each thin VPC, any node but the first and last ones may act 
as decomposition points. Nodes are tried sequentially for decomposition until 
one that results in successful aggregation has been found: 
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procedure main 

form fullVPClist and thinVPClist from design; 

clear candVPClist 2oid failVPClist; 

let selectedPath be the first member of thinVPClist; 

let breakNode be the node after firstNode; 

repeat 

if successful (selectedPath, breakNode) then 

implement pending decompositions and aggregations; 
let selectedPath be the first member of thinVPClist; 
let breakNode be the node after firstNode; 
else if breakNode is not the node before lastNode then 
restore pending decompositions and aggregations; 
let breakNode be the next node; 
else 

restore pending decompositions 2uid aggregations; 
move selectedPath from thinVPClist to failVPClist; 
endif 

until selectedPath is undefined 
stop; 

endprocedure 

The function successful (path, node) returns true or false depending 
on whether the two segments resulting from decomposing path at node will 
can be successfully aggregated or not: 

function successful (selectedPath, breedcNode) 

let prefixPath be selectedPath from firstNode to breadcNode; 
let suffixPath be selectedPath from breakNode to lastNode; 
if match (suffixPath, breakNode) and match (prefixPath, breakNode) then 
return true; 
else 

return false; 
endfunction 

The function match (path, node) returns true or false depending on whether 
path can be aggregated or not. Successful aggregation can result either from a 
match to an existing member of fullVPClist, or from a match to a member 
of candVPClist such that aggregation results in that the member qualifies as 
a full VPC. The function first tries the former option, then the latter one, and 
returns with a negative result if both options fail: 

function mat ch( subpath, breeikNode) 

if fullVPClist contains a path equal to subpath then 
set pending implementation; 
return true; 

else if extendible (subpath, breakNode) then 
return true; 
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else 

return false; 
endif 

endf unction 



The function extendible (path, node) handles the segments which may 
become full VPCs in candVPClist. It returns true or false depending on 
whether aggregating path with other segments results in full VPCs or not. 
These other segments may already be in candVPClist or tracked down and 
inserted by scanning the remaining members of thinVPClist. The first step is 
to find path in ccindVPClist. If this is successful, path is added to the existing 
member, otherwise path is entered as a new member. In the former case may 
the existing member now qualify as a full VPC, in which case the aggregation 
is successful. Otherwise more segments must be identified from thinVPClist 
and added on to result in a successful aggregation. The procedure therefore 
scans the latter list for members where a segment equal to path may be 
formed. The scanning continues until a new, full VPC can be formed, or until 
all members of thinVPClist have been tried. Each member scanned is tested 
for successful aggregation of both segments in the same way as in the main 
procedure, i.e. we may apply recursion: 



function extendible ( subpath, breaikNode) 

let mergePath be the member of candVPClist equal to subpath; 
if mergePath is defined then 

let mergePath be the aggregate of mergePath and subpath; 
if mergePath qualifies as a member of fullVPClist then 
set pending implementation; 
return true; 
endif 
else 

make subpath a new member of candVPClist; 
endif 
repeat 

let mergePath be the next member of thinVPClist 
with a segment equal to subpath; 
if mergePath is defined then 

if successful (mergePath, brecikNode) then 
set pending implementation; 
return true; 
endif 
endif 

until mergePath is undefined 
reset pending implementation; 
return false; 
endfunction 
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4 NUMERICAL RESULTS 



4.1 Test Scenario 

(a) Networks, Traffics, and Tools 

To facilitate numerical tests, a computer programme was used to generate 
eight distinct networks of A = 20 nodes and K = S distinct traffic de- 
mand matrices for each network. Requests for connections arrive according 
to independent Poisson processes for each origin-termination (OT) pair. The 
connection holding time is assumed to be negative exponentially distributed 
with unit mean. User demands are uniformly distributed between about 5 and 
350 Erlangs per OT pair, with a difference of about ±20% per OT pair from 
one matrix A{k) to another A{k') (corresponding to similar traffic demand 
variations over the time of the day, the day of the week etc.). Network trans- 
mission capacities C are set to allow for a VPC network configuration jD( 0) 
with one VPC per OT pair (along the shortest physical route) with a capacity 
that allows a probability of rejection of exactly 10“^, for a traffic which is the 
average over the whole range of traffic matrices A(l), . . . ,A{K). To give an 
idea of the actual test networks, we provide an example of a network and a 
traffic matrix in appendix 1. 

Next, a network simulator was constructed which implements any test net- 
work according to its capacity matrix C and an associated traffic matrix A{k). 
To simulate traffic dynamics, user demands change every T = 30 time unit 
by replacing a traffic matrix A{k) by its successor A{k ± 1) in a cyclic fashion 
such that A{K) is followed by A{1). 



(b) Congestion Control and Routing 

Requests for a connection to a node d arriving at a node o are accepted if 
there is enough free bandwidth available, i.e. if the number of connections 
in progress is less than the allocated capacity (recall that bandwidths are 
expressed as circuits). As indicated before, there may be more than VPC for 
every OT pair and all options are tried for all requests. Paths which have 
been modified into two hops are tried last and require two VCs, one per hop. 
Connections over one hop paths are said to be direct while those over modified 
paths are said to be broken. Requests which cannot find free bandwidth on 
any of these options are rejected. 

To reduce the probability of rejection we can allow rejected requests to hunt 
for free bandwidth on two VPCs in series, i.e. along a two-hop path where 
the first hop is from the origin to an arbitrary intermediate node and the 
second one from the intermediate node to the termination. Noting the ap- 
parent similarities to overflows in circuit switched networks, we have adopted 
the dynamic alternative routing method (DAR) of selecting the intermediate 
node (Gibbens et al. 1989) and deployed trunk reservation (Katschner 1974) 
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in order to prevent excessive, inefficient use of this facility. Connections over 
such paths are referred to as overflowed. To limit the number of intermediate 
nodes per connection are paths which have been modified not available for 
this feature. 

A modified path also constitutes two direct paths: one between the ori- 
gin and the intermediate node, and another one between the intermediate 
node and the termination. To achieve maximal statistical sharing, we give 
connection requests between these two pairs full access to the bandwidth pro- 
vided by the modified path (z.e. it is not reserved to the original OT pair). 
This means that connections between the intermediate node and either end 
node can make a modified VPC unable to accept end-to-end connection re- 
quests, even if less than the engineered number of end-to-end connections are 
in progress. Request for end-to-end connection which fail in this way are said 
to be blocked. 



(c) Virtual Path Management 

In the off-line approach to VPC management considered here are the K traffic 
matrices and their times of occurrence assumed to be known in advance. 
This allows the K VPC network designs to be computed in advance and 
implemented in the network as traffic change, i.e. design D{k) is followed by 
design D{k-\-l) etc. and design D{K) by design D{1). 

Changing VPC network designs involves connecting, modifying, and closing 
VPCs. A VPC connection is when a new physical route is opened between 
two nodes by inserting entries in the routing tables of the VPC switching en- 
tities at all nodes along the route, A VPC modification is when an existing 
physical route between two nodes is kept but the bandwidth allocated to it is 
changed by the replacing the contents in the CAC tables of the VC switching 
entities at the two end nodes, and a VPC disconnection is when an exist- 
ing physical route between two nodes is closed by removing entries from the 
routing tables of the VP switching entities at all nodes on the route. 

Changing VPC designs may lead to a situation of bandwidth violation. 
This means that the number of connections on a physical link exceeds its 
capacity, i.e. the number it can support at a given cell level QoS. Bandwidth 
violation thus means that cell level QoS is impaired, and may happen if a 
new VPC network design means more bandwidth for some VPCs and less 
for others. A shortage {i.e. a violation) will then occur if (i) the number of 
connections in progress on VPCs subject to a bandwidth decrease exceeds the 
new bandwidth and (ii) their new, lower limits are reached slower than the 
number of connection in progress on VPCs subject to a bandwidth increase 
move towards their new, higher limits. In general, the impact of a bandwidth 
violation depends on the degree of violation and time during which it persists. 
The problem can be addressed in many ways; e.g. by physical rerouting, where 
the excess connections are physically rerouted while in progress (as is done 
for hand overs in cellular, mobile systems) and stay there for their remain- 
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ing time; virtual rerouting, where excess connections are logically rerouted 
by requesting non-saturated VPCs over the same links to increment their 
occupancy during their remaining time of the excess connections; by polic- 
ing, where police mechanisms will impose higher cell losses to make excessive 
streams conform with the imposed bandwidth requirements; or by ignorance, 
where actual violations (i.e. on links where both conditions above are fulfilled) 
will suffer from cell losses. 

Each option is associated with costs, in the first two the costs refer to 
processing of all potentially dangerous calls, in the third one the costs refer to 
cell losses for all calls in the potentially violating flow, and in the fourth one 
the costs refer to cell losses for all flows on actually violated links. (Note that 
much fewer cells thus will be lost with option four than with option three!) 



4.2 Algorithm Performance 
(a) Static performance 

Clearly, the outcome of the algorithm is dependent on the order in which 
thin VPCs are tried. We have adopted the approach to treat the thin VPCs 
in descending order of the path length (that is, the number of transmission 
links in a VPC). The rationale behind this is that in general it is harder 
to decompose a long VPC into two segments and hence it is better to deal 
with these at an early stage when more alternatives are available by way of 
recursion of other thin VPCs. In addition to processing the thin VPCs in 
descending path length, for the same path length we process the thin VPCs 
in ascending order of offered traffic or allocated circuits (as the case may be) 
since it is expected it is harder to accumulate traffic to satisfy the traffic 
threshold requirement for a VPC with a smaller volume of traffic. 

We have tested our procedure on the VPC designs D{) for all 64 network 
and traffic configurations. The initial designs contain between 300 and 400 
VPCs. Using a threshold value of 30 circuits, the numbers of thin VPCs lie 
in the range of 100 to 150. 

After applying the procedure, the numbers of unmodified thin VPCs lie in 
the range of 0 to 16, the average number is about 6.5. The numbers of new 
full VPCs formed lie in the range of 10 to 31, with an average of about 19. 
This means that for these originally thin VPCs, no intermediate switching 
of their end-to-end traffic is required because of success in traffic aggregation 
from longer thin VPCs that use these VPCs as one of their two segments. 
The numbers of new VPC formed lie in the range of 0 to 9 with an average 
of about 3 (recall that new VPC are formed to carry transit traffic only). 

For two test cases, we have inverted the order of processing thin VPCs, 
that is, we process the thin VPCs in ascending order of path length. In one 
test case, the number of unmodified thin VPCs has gone up from 6 to 10. In 
another case, the number of unmodified thin VPCs has gone up from 7 to 12. 
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Table 1 Total results expressed as occurrences. 



Metric 


Direct 


Overflow 


used 


Mod. 


Ori. 


Mod. 


Ori. 


VCC rejected (%) 


1.4692 


1.6896 


0.7316 


0.8679 


VCC broken (%) 


1.6445 


— 


1.6890 


— 


VCC blocked (%) 


0.0451 


— 


0.0796 


— 


VCC overflowed (%) 


— 


— 


0.9901 


1.2281 


VPC modifications (%) 


0.0239 


0.0279 


0.0239 


0.0279 


VPC connections (%) 


0.0187 


0.0391 


0.0187 


0.0391 


VPC disconnections (%) 


0.0186 


0.0389 


0.0186 


0.0389 


Bandwidth violation (%) 


0.0009 


0.0008 


0.0020 


0.0017 



(b) Dynamic Performance 

Simulating each network for 4,800 time units (corresponding to 20 cycles 
of traffic patterns or about 150 million connections), we obtain the results 
shown in Table 1. The large number of connections per network means that 
the confidence obtained for each network is very high. To limit the amount 
of data, the tables show averages over all networks. The variance that follows 
from network differences is omitted since we do not think it is particularly 
significant to this work but would make the tables harder to read. 

The columns refer to specific combinations of strategy for routing (direct 
routing only or overflow routing applied) and design (modified or original). 
The rows are divided into three groups, the first one relate to the handling 
of VCCs, the second one to the handling of VPCs, and the third one to 
bandwidth violations. 

The first group gives the fractions of requests resulting in rejected, broken, 
blocked, and overflowed connections respectively. As expected, it is seen that 
both modification and overflow routing reduces rejection and consequently im- 
proves utilisation. With modified networks, a little over 1.5% of all requests 
try broken connections, and that the number increases if overflow routing is 
used. The small number follows from the restrictive usage in terms of modi- 
fication threshold and path search order, and the increase is a result of more 
extensive usage following from the overflow option. Blocking events exhibit a 
similar behaviour, where approximately one request out of 2,000 arriving ones 
(or one out of 30 for which modification is tried) is blocked. With overflow 
routing, about 1% of all requests are handled as overflowed connections. The 
small number follows from the ability of the networks to satisfy most requests 
as direct connections, and the lower value noted for modified designs follows 
from the fact that overflow routing has less to add when design modification 
already has made more circuits available to a wide range of OT pairs. 
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Table 2 Revenues and expenses assumed in the performance evaluation. 



Action 


Gain 


Cost 


Carrying a direct VCC 


1.00 




Carrying a broken or overflowed VCC 


1.00 


0.01 


Performing a VPC connection (per node) 




0.10 


Performing a VPC modification (per node) 




0.10 


Performing a VPC disconnection (per node) 




0.10 


Bandwidth violation (per link, percentage, and time) 




10,000.00 



The second group contains the number of modifications, connections, and 
disconnections of VPCs per generated call request, and the average degree of 
bandwidth violation. It is seen that the number of modifications, connections, 
and disconnections are independent of the VCC routing strategy but drop 
somewhat with modification. The first observation follows immediately from 
the fact that the VPC network design algorithm does not take the routing 
strategy into account. The second observation is related to the decrease in 
number of VPCs in modified designs, a conclusion which is supported by 
noting that the drop is strong for connections and disconnections, but only 
weak for modifications. 

Finally, it is seen in the third group that bandwidth violations are marginal 
in all cases. The larger numbers noted for modified designs and overfiow rout- 
ing respectively and in combination follow from the associated higher utilisa- 
tion. 

Our overall purpose is to maximise network profit, i.e. the difference be- 
tween revenues and expenses. Revenues come from charging customers for the 
usage of services, and expenses are associated with maintaining the network 
and providing the services. To obtain an overall metric of network profit we 
introduce a monetary unit which is used to express all revenues and expenses. 
Our choices, inspired by discussions with operators, are summed up in Table 
2. Although one can think of different values, we feel that the order of mag- 
nitude of our choices is reasonable, and they allow us to capture all aspects 
of our study into a single metric. 

The monetary unit is set such that a direct connection represents a gain of 
1.00 at no cost. A broken or overflowed connection represents the same gain, 
but with an additional cost of 0.01 to account for the additional overhead 
and reduced QoS associated with a second VC. VPC management actions 
represent costs of 0.10 per node (corresponds to ten times the cost of a VC 
action, a relatively high number we believe). Finally, bandwidth violation 
represent a cost of 10,000.00 per link, per relative degree of violation, and per 
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Table 3 Total results expressed as monetary units. 



Metric 


Direct 


Overflow 


used 


Mod. 


Ori. 


Mod. 


Ori. 


Revenues per time unit 


30140.61 


30139.85 


30140.48 


30139.54 


Expenses per time unit 


458.88 


516.50 


248.50 


280.75 


Profitability (%) 


98.48 


98.29 


99.18 


99.07 


Improvement potential (%) 


1.52 


1.71 


0.82 


0.93 



Table 4 Separated results expressed as monetary units. 


OT pair 
category 


Metric 

used 


Direct 
Mod. Ori. 


Overflow 
Mod. Ori. 


Direct 

paths 


Profitability (%) 
Improvement potential (%) 


98.84 

1.16 


98.55 

1.45 


99.30 

0.70 


99.17 

0.83 


Mixed 

paths 


Profitability (%) 
Improvement potential (%) 


97.92 

2.08 


97.88 

2.12 


98.91 

1.09 


98.86 

1.14 



time unit. Table 3 shows the same results as in Table 1 expressed as monetary 
units. 

The first group of rows presents the absolute results per time unit and the 
second group gives some normalised results. In the first group it is seen that 
the networks are offered about 30,000 Erlangs, and the expenses per time 
unit are small compared to the revenues. The second group considers the 
actual profit (revenues minus expenses) in relation to the theoretical optimum 
(maximal revenues and no expenses). It is seen that all four cases are close to 
full profitability, but what is more important is the remaining improvement 
potential. Comparing to the case of neither modification nor advanced routing 
with an improvement potential of about 1.7%, deploying both of them exploits 
about half that potential down to 0.82%. It is also seen that the improvement 
of advanced routing (0.78%) is larger than that of modification (0.19%), and 
that the two features are partly overlapping since the total improvement of 
both actions (0.89%) is less than the sum of the individual improvements 
(0.97%). 

To get a deeper understanding of the results, we have also conducted sep- 
arate measurements for OT pairs with and without modified VPCs in their 
designs do,i(^)- Table 4 gives the results in monetary units for OT pairs with 
direct VPCs only (upper rows) and with some modified VPCs (lower rows). 

The most important conclusion is that both classes of OT pairs benefit from 
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Direct routing 



Overflow routing 





VCC cost factor VCC cost factor 

Figure 3 Profitability of modification vs. VCC and VPC costs for direct 
routing (left) and overflow routing (right) . 



modification and advanced routing. It is also interesting to note that OT pairs 
without modified VPCs benefit the most from modification, an observation 
which is attributed to the fact that more bandwidth is made available to 
them, while OT pairs with modified VPCs, which require free bandwidth 
on two VPCs at the same time, cannot access the bandwidth to the same 
extent. On the other hand the situation is reversed with advanced routing, an 
observation which leads to the conclusion that generally disadvantaged OT 
pairs are more likely to have their paths modified. 

Finally, we study the sensitivity of our conclusions with respect to costs. The 
diagrams in Figure 3 show the cost ranges in which modification is profitable 
for simple and advanced routing respectively. VOC costs refer to the additional 
expense associated with broken or overflowed VCCs, while VPC costs refer to 
the expense associated with connecting, modifying, or disconnecting a VPC. 
As expected, it appears that modification is always profitable if the VCC cost 
is little or none, and that the higher the VCC cost, the higher must the VPC 
cost be for modification to make sense. 



5 CONCLUSIONS AND FURTHER WORK 

This work represents a first attempt at investigating the trade off between 
VPC and VCC switching. Earlier works in the area, e.g. (Burgin 1989), have 
come to the same conclusion, i.e. that VPC and VCC switching should be 
used in combination, but this study is, to the author’s knowledge, the first 
with this level of detail and the first to study dynamic aspects. Building on our 
earlier results on optimal management on fully interconnected VP networks, 
we have proposed a method to identify and eliminate inefficient VPCs. We 
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have also been able to demonstrate how the method can be applied successfully 
to improve network profitability. 

An obvious issue for further investigations is the question of optimal mod- 
ification in terms of chosing the best threshold. Looking at the pros et cons 
of VPC and VCC routing, it is obvious that this threshold must depend on 
the relationship between VPC and VCC management costs as well as on the 
multiplexing characteristics of each service class. 

Another interesting issue is to do iterative designs in which the original 
traffic demand matrix and the new network design matrix are combined into 
a new traffic demand matrix which again is subject to VPC network design 
followed by possible modification. The same procedure may then be repeated 
over and over again until the demand matrices converge. 

A further point of interest is to apply a threshold to a complete OT pair 
rather than a single VPC. In this way are inefficient flows handled in a more 
collected eflFort. Finally, we would like to conduct a series of test for the case 
where VPC designs are computed in real time based on on-line estimations of 
trafliic demands. 

Finally, the complexity of our algorithm, and alternative algorithms are 
strong points of interest. 
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APPENDIX 1 TEST NETWORKS 

Below we give an idea of the kind of test networks used by showing a sample 
topology. Figure 4, a sample capacity matrix. Table 5, and a sample demand 
matrix. Table 6. A complete description of the algorithm used to generate the 
networks is given in (Arvidsson 1995). 
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Table 5 A sample capacity matrix. 



OT 


C 


OT 


C 


OT 


C 


OT 


C 


OT 


C 


1 - 2 


2570 


1 - 6 


600 


1 - 8 


670 


1 - 9 


1560 


1-13 


2160 


1-16 


1510 


2 - 5 


1320 


2 - 8 


500 


2-12 


1190 


2-16 


2270 


3 - 9 


1450 


3-10 


1690 


3-15 


510 


4-11 


540 


4-14 


2560 


4-18 


90 


5-12 


1350 


5-13 


1090 


6 - 9 


2760 


6-10 


330 


6-16 


410 


6-19 


740 


6-20 


3130 


7 - 8 


1540 


7-11 


1350 


7-14 


4030 


7-16 


4540 


7-17 


500 


7-19 


610 


7-20 


650 


8-12 


930 


8-16 


580 


8-17 


810 


8-20 


560 


9-10 


1070 


9-13 


1410 


9-15 


2280 


9-16 


2980 


10-19 


2070 


11-14 


590 


11-18 


1140 


11-19 


960 


11-20 


2030 


12-17 


2170 


13-15 


1610 


14-17 

19-20 


2360 

800 


14-18 


2130 


14-19 


1580 


16-17 


1200 


16-20 


1060 



Table 6 A sample demand matrix. 



OT 


A 


OT 


A 


OT 


A 


OT 


A 


OT 


A 


1 - 2 


156.6 


1 - 3 


131.6 


1 - 4 


190.2 


1 - 5 


85.6 


1 - 5 


85.6 


1 - 6 


46.2 


1 - 7 


44.9 


1 - 8 


30.6 


1 - 9 


54.4 


1-10 


140.9 


1-11 


111.5 


1-12 


223.1 


1-13 


113.8 


1-14 


42.4 


1-15 


72.6 


1-16 


35.4 


1-17 


77.8 


1-18 


53.1 


1-19 


132.5 


1-20 


124.4 


2 - 3 


313.4 


2 - 4 


135.6 


2 - 5 


18.4 


2 - 6 


90.1 


2 - 7 


11.5 


2 - 8 


93.6 


2 - 9 


71.0 


2-10 


37.3 


2-11 


162.6 


2-12 


136.7 


2-13 


204.2 


2-14 


27.4 


2-15 


202.7 


2-16 


178.4 


2-17 


257.2 


2-18 


36.0 


2-19 


305.4 


2-20 


207.9 


3 - 4 


283.2 


3 - 5 


121.1 


3 - 6 


35.0 


3 - 7 


20.8 


3 - 8 


23.3 


3 - 9 


64.2 


3-10 


315.9 


3-11 


270.3 


3-12 


206.5 


3-13 


163.1 


3-14 


276.6 


3-15 


80.5 


3-16 


191.6 


3-17 


106.4 


3-18 


234.3 


3-19 


195.9 


3-20 


339.5 


4 - 5 


52.5 


4 - 6 


282.1 


4 - 7 


257.6 


4 - 8 


148.4 


4 - 9 


160.9 


4-10 


13.6 


4-11 


142.7 


4-12 


273.6 


4-13 


134.1 


4-14 


33.2 


4-15 


11.6 


4-16 


77.2 


4-17 


22.4 


4-18 


67.5 


4-19 


256.9 


4-20 


40.0 


5 - 6 


112.1 


5 - 7 


151.2 


5 - 8 


24.4 


5 - 9 


311.8 


5-10 


14.4 


5-11 


205.0 


5-12 


261.8 


5-13 


9.4 


5-14 


245.5 


5-15 


264.6 


5-16 


204.4 


5-17 


161.8 


5-18 


99.3 


5-19 


48.5 


5-20 


118.5 


6 - 7 


248.3 


6 - 8 


50.6 


6 - 9 


249.3 


6-10 


272.3 


6-11 


118.4 


6-12 


28.2 


6-13 


104.7 


6-14 


244.3 


6-15 


252.4 


6-16 


167.0 


6-17 


126.4 


6-18 


246.0 


6-19 


15.9 


6-20 


120.1 


7 - 8 


128.0 


7 - 9 


304.4 


7-10 


108.2 


7-11 


85.8 


7-12 


246.5 


7-13 


32.4 


7-14 


150.1 


7-15 


247.1 


7-16 


274.7 


7-17 


70.5 


7-18 


191.6 


7-19 


195.2 


7-20 


44.2 


8 - 9 


60.9 


8-10 


60.4 


8-11 


175.7 


8-12 


191.6 


8-13 


242.3 


8-14 


294.7 


8-15 


106.8 


8-16 


245.9 


8-17 


277.8 


8-18 


250.6 


8-19 


22.0 


8-20 


226.6 


9-10 


151.2 


9-11 


259.1 


9-12 


274.9 


9-13 


161.3 


9-14 


13.6 


9-15 


200.7 


9-16 


328.8 


9-17 


175.3 


9-18 


53.7 


9-19 


108.4 


9-20 


322.5 


10-11 


155.9 


10-12 


199.5 


10-13 


230.6 


10-14 


297.6 


10-15 


100.3 


10-16 


13.4 


10-17 


151.4 


10-18 


105.1 


10-19 


39.8 


10-20 


111.5 


11-12 


171.1 


11-13 


149.3 


11-14 


210.2 


11-15 


49.0 


11-16 


265.2 


11-17 
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12-13 


92.2 
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12-16 
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71.6 


13-20 


223.2 


14-15 


156.7 


14-16 


115.6 


14-17 


203.6 


14-18 


157.6 


14-19 


132.8 


14-20 


154.5 


15-16 


209.5 


15-17 


276.4 


15-18 


45.3 


15-19 


133.2 


15-20 


328.1 


16-17 


159.2 


16-18 


192.1 


16-19 


169.2 


16-20 


132.2 


17-18 

19-20 


138.2 

6.4 


17-19 


239.7 


17-20 


119.9 


18-19 


73.4 


18-20 


298.1 
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Abstract 

Because of development of optical technology, the dream of "All op- 
tical networks" has become true. Without conversion between optic et 
electronic, "All optical networks" are well suited to fulfill the growing 
demand on bandwidth. One of the popular methods to operate all opti- 
cal networks is the "defiection" principle, different from recent methods 
such as diffusion and virtual circuits used in Internet and ATM networks 
respectively. In this paper, we first compare possible high-bandwidth so- 
lutions able to interconnect sources in a metropohtan area, through all 
optical finks. Secondly, we define and study two deflection routing meth- 
ods, named "simple" method and "configuration" method, based upon 
the knowledge of network state. Our investigation has been done by using 
both analytical and simulation approaches. 



Keywords: deflection, networking structure, routing, analysis, simulation 
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1 Introduction 

All optical networks rely on optical techniques for both transmission (fibres) 
and switching (optical switches). The incredibly large amount of bandwidth 
optical devices provide is certainly a major component in future networks, and 
in fact it has already been considered and shown to be well suited to fulfill the 
growing demand on 1) bandwidth per user, 2) protocol transparency, 3) higher 
path reliability, and 4) simplified operation and management [6]. 

Compared to traditional networks making use of electronic devices, one criti- 
cal issue for such networks is the lack of storage capability. The only way current 
optical technology can supply the required buffers for storage is through delay 




lines. However, the number of fibre delay lines is very limited. For this rea- 
son, new solutions to deal with contention situations and network optimization 
choices are needed. One of the popular methods allowing to operate all optical 
networks is ’’deflection". When two or more packets attempt accessing the same 
output at the same time, the node lets one packet through and routes others 
towards idle outputs according to some criteria. 

In fact, deflection routing is not only a solution for lack of storage capability 
but also for congestion control and bandwidth allocation. Deflection networks 
have no internal congestion as opposed to present ATM or IP networks ex- 
hibiting bottlenecks. All packets entering the network will exit, resulting in 
guaranteed packet delivery and no packet loss. In addition, deflection networks 
offer the possibility of total bandwidth allocation to a communication between 
two end-users, so that they are well suited to situations with day-to-day traffic 
variation or uncertain forecasts. 

Most studies in deflection networks [1, 4, 7] are usually based on 2 x 2 
switches. In the present paper, the availability of 4 x 4 switches is assumed. 
First, we address some simple arguments to choose between basic architectural 
alternatives in the framework of Metropolitan Area Networks (MAN), through 
all optical links using the deflection method. A comparison is given between the 
’’Manhattan Street’’ architecture and the ’’star’’ architecture. The comparison 
attempts to address such aspects as the optimization of the total length of fibre 
involved or the traffic capability through simple dimensioning arguments. 

The comparison favors the ’’Manhattan Street’’ architecture, which is thus 
considered in the sequel, devoted to a study of deflection routing. The complete 
definition of the routing algorithm allows a great flexibility. Especially, the 
choice of the ’’favored’’ packet (the one being selected in case of contention) may 
be done based upon various arguments (such as the ’’age’’ of conflicting packets, 
or the distance to the destination, or more generally the knowledge the nodes 
have about their neighbourhood). We define and study two routing methods 
based upon the (partial) knowledge the nodes have about the network state. 
The methods, named ’’simple’’ method and ’’configuration’’ method, differ by 
the degree of knowledge. 

The study yields the throughput of a connectionless network using deflection 
routing. It shows that the ’’configuration’’ method is better than the ’’simple’’ 
method. This is probably because the ’’configuration’’ method depends on the 
network-wide information which is available at the node. 

An analytical model is proposed under the assumptions of ’’independence’’ 
between nodes and uniformity of the traffics. The model is validated through a 
simulation experiment. 



139 




2 Network Architectures 

2.1 Some Networking Architectures 

Most research on deflection routing is based upon regular topologies such as the 
"Manhattan Street" networks with bidirectional links (torus, see Figure 1, or 
grid, in which the "backplane" links are suppressed), and the "Shuffle-exchange" 
networks since they offer the same number of inputs and outputs at each node 
[3, 9]. Among these architectures, the "Manhattan Street" network seems more 
adequate for the following reasons: 

• (1) Most often, there exists two paths with the same shortest distance to 
the destination. 

• (2) The cost of a deflection is constant (2 additional links). 




Figure 1: A bi-directional "Manhattan Street" torus 



Two variants of the "Manhattan Street" architecture have been considered, 
namely the torus and the grid. Figure 1 displays the diagram of the torus, 
characterized by identical nodes with 4 input links and 4 output links, and 
an additional local I/O access - refered to as a 4 x 4 switch. The end nodes 
wrap around, row (or column) edges are connected, leading to identical switch 
nodes in terms of the number of inputs and outputs. 

On the other hand, the "Manhattan Street" grid has no connections between 
the end points (row as well as column). Nodes are not equivalent, and three 
node matrices (the corner nodes, the edge nodes and the general nodes) must 
be considered to specify the "Manhattan street" grid architecture. 
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The star architecture provides a simpler solution to connect nodes in a 
metropolitan environment [10]. It offers many advantages at the physical level of 
all optical networks ([8]). This raises the point of choosing, among ’’Manhattan 
Street” network and ’’star” network, the best suited for Metropolitan Area Net- 
works (MAN) architecture. In this Section, a comparison of these alternatives 
is given in terms of the total fibre length and of traffic carried. 

Also, a comparison between ’’Manhattan Street” torus and grid has been 
studied. To summarise shortly, a ’’Manhattan Street” torus is far better than 
the grid version. The detail of the comparison is presented in [10]. 



2.2 The grid solution 

2.2.1 Total Fibre Length 

We use the following notations and assumptions. 

• L : the ’’diameter” of the network, i.e. the length of each row / column 
of the Manhattant Street network; 

• N : the number of columns or rows. 



• K : the total number of sources. 



• c : the packet speed in the fibre. 

• m : the time needed to send one packet, so that cm is the spatial extension 
of the packet (the total fibre length a packet occupies) . 

• A the total arrival rate of packets in the network, and A the arrival rate 
per node (A = A/TV for uniformly distributed load). 

We assume that each link is composed of two fibres to ensure bi-directional 
connectivity (see the structure of the node, on Figure 1. The total fibre length 
could be written as follows: 

length = 4iVi+:^, (1) 

the first term represents the total fibre length of the network (2TV rows, each 
bidirectional), and the second term is the total fibre length of the distribu- 
tion network (connection of source to its nearest neighbour node, assuming 
sources are uniformly scattered in the area; the length is around half the ele- 
mentary square side, and 2 fibres are provided). The optimization problem can 
be stated as follows : assuming a given number K of traffic sources, how many 
rows/columns should be used ? Taking derivative of equation (1), the number 
of rows or columns N may be optimally chosen as : 




( 2 ) 



Using this value as N in equation (1), it is shown that the total fibre length 
is equally shared between the core network and the access network, and that 
each node serves 4 sources. 
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2.2.2 Maximum Carried Traffic in Network and Maximum Traffic 
per Source 



Let us assume that the order of magnitude of a connection length is L, so that 
the mean packets delay in network is L/c. Each packet occupies a length on 
the fibre equal to cm. The total fibre length in transit network is 4NL, so 
4NL/cm packets may be simultaneously stored in the network. According to 
Little formula, the maximum carried flow A of the network is as follows: 



A < 



4NL 
cm , 



4Nr 



( 3 ) 



where r(> 1) is a factor which takes account of the additional cost (in link 
utilization) of the ’’deflection" method. Equation (3) shows that the throughput 
depends only on N, r and m but not on L. Assuming the sources offer identical 
traffics, this yields the maximum packet flow intensity per source. Expressed in 
terms of the admissible load, one gets: 

4N r 

Load/Source == — ^ (4) 

Choosing for N the optimum value (2), and taking for r the ideal value 
r = 1, one gets an upper bound for the admissible load per source: 

LoadMax = ^ (5) 

N 

Note that the above equation means that the maximum load per source (and 
also the maximum load per node) decreases as network size increases. This 
result is discusses in Section 6. 




Figure 2: A bi-directional ’’star’’ structure 



2.3 The "Star Network" Solution 

The "star" network is built according to Figure 2 in which each connection has 
a length around L/2. In this case, the total fibre length could be equal to LK, 
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while the maximum flow carried is given by Equation (3). Since the maximum 
flow carried is the same in ’’Manhattan Street” and in ’’star” architecture, the 
comparison is restricted to the total flbre length. 



2.4 Comparison in Total Fibre Length 

One may compare the equations giving the total length in the two architectures. 
Equation (6 below gives the condition under which a preference is given to 
’’Manhattan Street” network in term of total flbre length. 

+ K < KN (6) 



This expression may be transformed to give the condition in terms of the 
number of sources. In the general case, the condition under which the ’’Man- 
hattan street” network should be preferred is: 



K > 



4AT2 



N -1 



(7) 



In the case where N is chosen optimally according to (2), it is immediate 
to show that the ’’Manhattan street” network uses less fibre length as soon as 
K > 16. 



2.5 The Choice of an Architecture 

The conclusions we get from this short analysis confirm feeling one gets from 
the literature survey, namely that the ’’Manhattan Street” architecture seems 
to be the most valuable one. So in the sequel the analysis is restricted to this 
case. 



3 Node Operation 

Without any buffering capability, the basic node consists of five inputs and five 
outputs according to the right-hand part of Figure 1 . Four inputs (or outputs) 
are interconnected to their neighboring nodes, and one input (or output) links 
the local source. As mentioned earlier, for a ’’Manhattan Street” torus all nodes 
have the same structure. 

To analyze performance, the deflection network is assumed to be packet 
switched and time slotted, with fixed slots. As a consequence, the network is 
synchronous. New packets are generated independently at each source according 
to a Bernoulli process, thus providing a uniformly distributed traffic density. 
Each node receives arriving packets at the beginning of a time slot, and sends 
them at the end of the time slot. 

Top priority is assigned to packets in transit rather than admitting new 
packets. A new packet could enter the network if at least one of the transit 
outputs of the associated node is idle. The goal is to protect packets in transit, 
which have already consumed bandwidth. 
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In the model, destinations of new packet are chosen independently and uni- 
formly at random from all nodes not coinciding with their sources. 

4 Deflection Routing 

4.1 Different Approaches 

The choice of the packet to be deflected has an influence on the overall perfor- 
mance. Here, we define and compare two methods which differ by the degree of 
knowledge a node has of its neighborhood. 

The simple method works as follows: each packet is preferably routed to 
its destination according to the ’’shortest path” criterion. A packet could be 
deflected if its preferred link is seized by another packet which has been pro- 
cessed previously (either the node has a fixed scanning policy so that packets are 
processed in a given order, or packets are processed according to some random 
phenomenon) . 

In the second method the node processes all incoming packets in a single 
operation. Among all routing combinations, the node chooses the one which 
minimizes the sum of the distances of packets to their destinations. We refer to 
this algorithm as the configuration method. 

Other methods are applicable, which make the ’’age” of the packets as a pref- 
erence criterion. The comparison between our approaches and these methods is 
beyond the scope of the present study. 

The increased complexity of the latter method prompts to verify the gain 
which is to be expected (note that the high speed at which packet must be 
processed makes the complexity of the routing operation a key factor) . 

4.2 Comparison of the Methods by Simulations 

By simulations, we compare the two routing methods in terms of throughput 
(see Figure 3). The configuration is with N = 6. 

Briefly, it is clear that network throughput (packets/slot) increases as the 
load increases from 0 packets/slot to 1 packet /slot per node. The maximum net- 
work throughput by ’’configuration” method reaches 0.83 packets/slot against 
0.57 packets/slot for the ’’simple” method. Figure 5 shows clearly the advantage 
of the ’’configuration” method over the ’’simple” one. The explanation is simple: 
the ’’simple” method allocates resources on the basis of a more local informa- 
tion, while the ’’configuration” method incorporates a more global information 
in its decision making process. 

The same kind of results could be derived from the other parameters of 
interest. For instance, the average end-to-end delay is significantly shorter for 
the ’’configuration” method (the curves are omitted here, being of little interest: 
see below for the delay in the ’’configuration” method). 

Consequently, under weak traffic conditions, there is no use to look for an 
optimized approach. However, for medium to large loads, the ’’configuration” 
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algorithm is worth considering. The gain, in terms of throughput is quite obvi- 
ous. 




Figure 3: Node throughput: ’’configuration” versus ’’simple” method 



5 An Analytical Model 

Previous analytical studies on deflection networks [1, 4, 7] have proposed accu- 
rate models to evaluate the behaviour of this routing mechanism. When these 
approaches are used on larger switching nodes (here, 4x4 nodes, instead of 2 x 2) , 
they fail in giving accurate results. This had lead to conduct deeper analyses of 
the phenomenon, allowing building a more sophisticated model which provides 
reasonable accuracy (a previous model, based on the same assumptions, has 
given quite poor results see e.g. [5]). 

5.1 Notation 

Now we use the following additional notations (the ’’types” are explained in the 
next section) : 

• d : the effective distance between the source of a packet to its destination. 

• do : the length of the shortest path from source node to destination. 

• D, Do : the mean values of d and do- 
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• u : link utilization; u\ (resp. U 2 ) denote the utilization due to type-1 (resp 
type- 2) packets, see below. 

• rii{k) : average number of packets of type i (see below) at distance k from 
their destination, and ni the total number: ni = Y^ni{k). 

• pf : deflection probability of a type-z packet. 

• P{ji,j 2 ) • probability that one typical packet meets ji type-1 packets, j 2 
type-2 packets in a node. 

• uji{k) : the flow of packets entering the network at distance k from their 
destination and being of type-z. 

5.2 Routing 

All packets arriving at the beginning of a time slot should be routed at the end 
of the time slot. Each node is located according to its coordinates (x,y) from 1 
to N. 

The algorithm attempts first to route the packets according to their shortest 
path. If no solution exists, the ^’configuration” method is run so as to minimize 
the total number of deflections. Packets to be deflected are taken at random. 
Note that one deflection at a node induces an increase in length of two additional 
links. 



5.3 Principle of the Analysis 



The key point is that 2 kinds of behaviour are to be observed. Packets which 
are in the same row/column as their destination have only one possible output. 
On the other hand, all other packets have 2 possible outputs. The latter ones 
are said type-2 packets, the former ones being type-1 packets. 

The probability for a packet to be type-1 or type-2, under our uniformity 
assumptions, depends only on the distance between the packet and its destina- 
tion. The model consists in tracking the number of type-1 and type-2 packets 
at distance k from their destination. From this all quantities of interest are 
derived. 

Let ni, 722 be the total average number of packets of type 1 and 2, and 
n = 721-1-^2. Let D — E{d) denote the average sojourn time of a packet (in 
number of hops). The rate at which packets enter (and exit from) the network 
is A = N^\. Little’s relation allows to write: n — XN‘^D. Now, there are 
available slots containing n packets so that u = n/AN’^ . Eliminating n gives the 
relation (see [3]): 



XD 

The Ui can be derived from u and the overall population: 



( 8 ) 



rii 

Ui = u 

ni H- ri2 



(9) 
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Consider a test packet. According to the Bernoulli cissumptions, the proba- 
bility that it meets ji,j 2 packets in the next node is simply: 

PU 1 J 2 ) = ~ (10) 

The deflection probability of the packet, given its type and given the number 
(ii j ^ 2 ) of possible conflicting packets may be estimated by a combinatorial anal- 
ysis. The following table summarizes the results (see [11] for a comprehensive 
analysis) : 



h 


72 


PiUi,h) 


P2ih,j2) 


0 


0 


0.0 


0.0 


0 


1 


0.0 


0.0 


0 


2 


0.073 


0.021 


0 


3 


0.223 


0.087 


1 


0 


0.125 


0.0 


1 


1 


0.172 


0.026 


1 


2 


0.279 


0.082 


2 


0 


0.229 


0.031 


2 


1 


0.297 


0.082 


3 


0 


0.316 


0.078 



Table 1: Deflection probabilities vs configurations 
The deflection probability of a type-i packet is given by: 

pf = . h) (11) 

J1J2 

and the average deflection probability (e.g. ratio of the number of observed 
deflection during a slot to the total number of packets being processed) : 

(12) 

Til -f U2 

Since each deflection lengthens the path to the destination by 2 links, the 
following relation holds: 

D = Do + 2P‘^D or£»=^-^^ (13) 

Do is easy to derive with the uniform assumption: Do « N/2 

5.4 Performance Analysis 

From the above elements, a set of equations is written which governs the steady- 
state behaviour of the rii{k). At each time slot, new packets enter the network. 
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Figure 4: Building the set of equations 14 at steady-state 



at some distance from their destination. At the same time, packets already in 
the network move, either being deflected (their distance increase by 1) or being 
successfully transmitted (their distance decreases). 

In the diagram, the upper row represents the rii ’s, and the lower row the n 2 ’s. 
Remark that all packets at distance larger than N/2 are all type- 2: they always 
have two outgoing links initiating a shortest path to their destination. The set 
of equations (14) is built from the diagram. In the equations, the notation qi — 
1 — pf is used for clarity. The coefficients of the terms account for the possible 
behaviours. For instance, a type-2 packet which is not deflected may remain 
type-2 with probability {k — 1)//?, or may become type-1; a type-2 packet which 
is deflected remains type-2; etc. We refer to [11] for the whole justifications, 
which are rather lengthy (especially for limit conditions, see k = N/2 — 1, N/2). 



ni(l) 

nfk) 

ri2{k) 



ni(f - 1) 



ni(k) 

n2{k) 

n2{N) 



Wi(fc) + qifii(k + 1) + - 1) + ^n 2 {k + 1), 

U!2{k) + + 1 ) + P2^2{k - 1 ) + ^^ni{k - 1 ) 

2<Ar< ^ 

wi(y - 1) + ]|?rin2(y) + ^«i(y - 2) 

/iV . Q2(N-4)_^ rN\ , fN 



- 1 



W2(y- 1)4- 



«2(f)+P^n2(f-2)+ 



q2{N-4) 

N-1 

4-^ni(f -2) 



W2(y) 4- 92^2(y 4- 1) 4-P2”2(^ - 1) 4-pfni(y - 1) 

0 , 



(14) 



<^2{k) + q2n2{k 4-1)4- P^nqik - 1) ^ < k < N 

uj2{N)+pin2{N -\) 



Note that the set of equations cannot be solved directly, since the deflection 
probabilities depend on the link loads Ui which in turn depend on the solution. 
So, an iterative method has been implemented which begins by chosing an 
initial distribution for the (rii) (the distribution without any deflection has been 
successfully used), from which a first set of (u*) is drawn, giving a second set of 
(n*), etc. The convergence is quite fast, and the method is thus quite efficient. 
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6 Validation 



Simulation experiments are conducted to compare results with the analytical 
model. These comparisons are based on the deflection probability, the node 
throughput and the lenght of the actual path. We assume a network with 
N = 10 rows and columns. 
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Figure 5: Comparison of node throughput: simulation versus analysis 

The first set of curves displays the throughput obtained as a function of the 
link load u: Figure 5. The solid line represents results from the model, while 
the isolated points are simulation results. The precision of the simulation runs, 
as estimated through the “method of blocks” is better than 10 %. The dotted 
line indicates the value of the carried traffic, without any deflection. This is 
the limit performance an infinite buffer system would give rise to. This limit 
is given by equation (8) with E{d) = E{do). An order of magnitude could be 
obtained by assuming the typical distance as given by Do = Ef/2, that is: 

^ (15) 

This result has been already given, see e.g. [3]. It shows that the carried traffic 
increases linearly with iV, that is as the square root of the number of nodes. 
This explains that the traffic per node decreases as the size increases, as already 
mentioned (Section 2). Actually, the overall admissible load increases, but the 
efficiency decreases, as for any meshed network. Especially, in the model taken 
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Figure 6: Comparison of mean length : simulation versus analysis 



here, the growth of the network (i.e., of N) is reflected in an increase in D, which 
corresponds to a very special kind of network evolution. 

Figure 6 shows the variation of the effective distance. For the upper limit, 
the distance is around 7 (in number of links). This implies that on the average 
a packet incurs one deflection (increasing the shortest path by 2). 

Figure [?] summarizes the previous two. The relation between D and 
makes it redundant; we present it however because deflection is the main phe- 
nomenon. The deflection probability remains moderate even for high link uti- 
lization. 

On the whole, the Figures show that the results of the analytical model 
are in good agreement with simulations, slightly overestimating the influence of 
deflection. 

The results are also quite encouraging, as far as the deflection method is 
concerned. They show that, in the configuration under study, deflection remains 
a rather unfrequent event (on the average, one deflection incurred during the 
packet’s travel). Thus, these results tend to accredite the possibility of using 
deflection principle for all optical high-speed networks without degradating the 
performance. 

For instance. Figure ?? tells that the maximum achievable carried load for 
the 10 X 10 network is around 0.5 per node, which is quite comparable with fig- 
ures a store- and-for ward network would achieve. As a matter of fact, congestion 
phenomena strongly limit the load in such networks: limits u to values around 
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Figure 7: Comparison of deflection probability : simulation versus analysis 
0.7, resulting in A’s around 0.56 - that is, only slightly greater. 

7 Conclusions 

This study has concentrated on the use of deflection routing for bufferless 
metropolitan area networks. Assuming a ’’Manhattan Street Network” tot- 
pology, it is shown that the so-called ’’configuration” routing algorithm exhibits 
significantly better behavior than the simpler, sequential, one. The improvement 
in network performance, examplified in Figure 3, largely justifies the increase in 
the algorithmic complexity. 

Also, a quite efficient analytical method is proposed, and validated through a 
simulation study. The numerical example shows clearly the appealing features of 
deflection routing, as compared with more traditional store- and-forward mech- 
anisms operating in buffered networks. So, deflection routing could appear as a 
possible solution for electronic-based networks, where buffer availability is not 
a critical issue. 
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Abstract 

The goal of this paper is an algorithm to find the CLR and cell delay charac- 
teristics for realistic ATM buffer systems with a large number N of reasonably 
complicated Markov modulated Bernoulli sources with M states per source. 
Matrix geometric methods have already reduced the complexity of finding 
these QoS measures by using rate matrices. However the dimensions of the 
rate matrices still grow exponentially with N. In ATM the number of sources 
at the input of these buffer systems is often so large that straightforward ap- 
plication of matrix geometric methods is impossible, since a set of (iV— 1)M^^ 
nonlinear equations would have to be solved to obtain the rate matrices. This 
paper presents a technique, based on the spectral analysis of the rate matrix, 
which makes computing the CLR for large N feasible. Using the Kronecker 
product structure of the blocks of the transition matrix, the problem of find- 
ing the eigenvalues of the rate matrix can be reduced to solving a set of iV -h 1 
non-linear scalar equations in the eigenvalue s of the rate matrix and in N 
dummy variables. Only 1 equation contains all variables; the other N equa- 
tions contain 1 dummy variable and s. Most of the eigenvalues can be found 
by using repeated substitution in the set of equations, but for some unstable 
roots we need a combination of repeated substitution and Powell’s Direction 
Set Method minimizing a squared error function. The eigenvectors of the rate 
matrix are obtained by solving as many sets of M linear scalar equations 
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as there are different eigenvalues of the rate matrix. It is possible to further 
reduce the complexity of the calculations by only considering eigenvectors 
corresponding to the most significant eigenvalues. 



1 INTRODUCTION 

Any ATM-network contains several buffer systems (such as statistical mul- 
tiplexers, rate adapters, switching elements, ...) where congestion can occur. 
If some sources put more cells in a buffer than the server can take out, and 
if this situation holds for a fairly long time, the buffer can become full and 
newly arriving cells are lost. Also the delay in a filled buffer is usually a sub- 
stantial amount of the total delay cells experience when they travel through 
the network. Therefore anyone who is interested in the performance of an 
ATM-network certainly wants to look at the Cell Loss Ratio or the cell delay 
characteristics (CTD, CDV, ...) in 1 buffer for a given arrival traffic. Which 
performance measure has to be computed depends on the traffic type. For 
real-time video (such as for video-conferencing) good delay characteristics are 
more valuable than a small CLR, whereas for the transmission of data (e.g. via 
ABR) it is more important to receive the information correctly and without 
losses. 

The performance criteria described above are stationary measures, which 
have the advantage that they can be computed from the equilibrium distribu- 
tion of the buffer occupancy. This is the probability that, starting from any 
initial condition at ^ = 0, the buffer will contain n cells after an infinite long 
time. Also non-stationary measures can be used to describe the performance 
of a network (such as the correlation between cell losses during a burst of 
arriving cells), but these fall outside the scope of this paper. 

In this paper we are mostly interested in the effects on the buffer occupancy 
of VBR traffic (video and data) which can be modelled by Markov Modulated 
processes (such as MMBP). These processes have also the advantage that they 
can approximate long range dependence over several orders of magnitude of 
the time scale, e.g. (Robert and Le Boudec “Can...” 1995) or (Robert and 
Le Boudec “Stock...” 1995). The results in this paper can be used for the 
dimensioning of buffers, the study of resource allocation, construction of flow 
control schemes, etc. Flow control will be the topic of a future paper: the 
effects of flow control will be considered there by introducing one or more 
thresholds in the buffer. As soon as the buffer occupancy exceeds a threshold, 
the sources (usually buffer systems upwards of the arrival traffic stream) are 
requested to lower the rates at which they send cells to the buffer. 

Several methods for the computation of the equilibrium distribution of 
the buffer occupancy have already been developed as in (Blondia and Casals 
1992), (Ye and Li 1994), (Xiong and Bruneel 1996) and (Naoumov et al. 1996). 
The Matrix Geometric Method (Neuts 1981) is widely used because of its 
numerical stability and its computational efficiency. In (Neuts 1981) it is 



156 




developed for Quasi-Birth-and-Death processes and infinite buffers, and in 
(Latouche and Ramaswami 1993) the authors propose an algorithm which re- 
duces the complexity of calculating the rate matrix in a logarithmic way. In a 
modified form this matrix geometric method can be applied for finite buffers 
(Hajek 1982). In (Wuyts and Boel 1996) the authors propose an extension of 
the matrix geometric method for a finite buffer with several arrival streams, 
where the QBD-structure is no longer valid. 

Unfortunately, even in the infinite buffer case the method of (Wuyts and 
Boel 1996) still involves the computation of iV — 1 rate matrices of dimen- 
sion X yielding a set of {N — 1)M^^ non-linear equations. In an 
ATM-network N can be very large (e.g. N = 20) such that the computation 
of the rate matrices becomes impossible. The main contribution of this paper 
is to present a technique, based on the spectral analysis of a rate matrix, 
which makes computing the CLR for large N possible, i.e. for realistic ATM 
arrival streams. The advantage of the spectral analysis of this matrix is that 
the matrix itself does not have to be computed and the method makes use of 
the inner Kronecker product structure of matrices Ai (corresponding to the 
arrival of i cells in 1 slot) in order to compute the {N — 1)M^ eigenvalues as 
efficiently as possible. Finding the eigenvalues of the rate matrix can then be 
reduced to solving a set of AT -h 1 nonlinear scalar equations in the eigenvalue s 
of the rate matrix and N dummy variables. The computation complexity can 
be further reduced by computing only dominating eigenvalues and their corre- 
sponding eigenvectors. Finding heuristic rules for deciding which eigenvalues 
are dominant (e.g. in the computation of the CLR) is a topic of current re- 
search. The method above can be compared with (Mitrani and Chakka 1995), 
where an analogous spectral method is described for a continuous time infinite 
buffer problem. This paper can therefore be seen as an extension to the dis- 
crete time finite and infinite buffer case with several independent Markovian 
arrival streams, implying a Kronecker product structure. 

In section 2 we describe a model for the VBR arrival traffic at an ATM 
buffer. In section 3 we introduce an algorithm for the spectral decomposition 
of the rate matrices and in section 4 the eigenvalues and corresponding left 
eigenvectors are used to compute the equilibrium equation. The algorithm is 
used to find some numerical results for a B-ISDN example in section 5 and 
section 6 shows briefly how to extend the algorithm to the finite buffer case. 



2 MODEL 

Consider one buffer in an ATM buffer system (such as a statistical multiplexer, 
a switching element, ...) with N independent Markov modulated Bernoulli 
sources at the input and 1 server at the output. The server removes one cell 
per time slot from the buffer if it is not empty. The transition matrix of the 
modulating Markov process for source i is given by an M x M-dimensional 
matrix Qi; Pi is an M x M-dimensional diagonal matrix with diagonal ele- 



157 




ments equal to the probability that source i will send a cell during a slot in 
which its state is j. 

Since we want to study the behaviour of the buffer occupancy, we need the 
Markov process Yk - (5^;, 5^(1), 5a;(7V)) E {0, x M} x ... x 
{1, ..., M}; jBfc denotes the number of cells in the buffer during slot k and Sk{i) 
represents the state of the modulating Markov process of source i during slot 
k. The collection of all states Yk with Bk = b cells in the buffer is called the 
“level” b. 

Let us consider for now only the infinite buffer case (later we show briefly 
how the technique must be modified for the finite buffer case) . The transition 
matrix P of Yk is: 



P = 



Aq + 


A2 


As 


A4 


An 


0 


0 


0 


Ao 


Al 


A2 


A3 


An_i 


An 


0 


0 


0 


Ao 


Al 


Aa 


An_2 


An-1 


An 


0 


0 


0 


Ao 


Al . 


An _3 


An-2 


An-1 


An 



\ 

/ 



(1) 



where each A„ is an {M^ x M^)-dimensional matrix, constructed as follows: 



An = PiQi 0 P2Q2 ^ PnQn 
An-1 = (I — Pi)Qi 0 P2Q2 0 ••• 0 PnQn 
+P1Q1 0 (I - P2)Q2 0 P3Q3 0 ••• ^ 

-h... 

+P1Q1 0 P2Q2 0 ••• 0 Pn-iQn-i ' 



I PnQn 



' (I - Pn)Qn 

Ao = (I — Pl)Ql 0 (I - P2)Q2 0 ••• 0 (I - Pn)Qn 



( 2 ) 



and 0 denotes the Kronecker product. 

The structure in (2) consists of sums of Kronecker products because the 
arrival stream consists of independent Markov modulated Bernoulli processes. 
This can be easily seen if one considers a simple example, e.g. N = 2 sources 
with M = 2 states per source: 



Pi = 



Pi{i) 0 
0 P2{i) )' 



( 9i,i(*) A 

V 92,1 (0 92 , 2 ( 2 ) ) 



The transition matrix P now consists of 3 diagonals, containing the matrices 
Ao, Ai and A2. (Ao)i,i is the probability that no cells arrive at the buffer 
during slot k^ source 1 will be in state 1 during slot A: + 1 and source 2 will 
be in state 1 during slot k -\-l under the condition that source 1 is in state 
1 during slot k and that source 2 is in state 1 during slot k. This probability 
equals (1 - pi(l))g'i,i(l)(l - pi(2))gi,i(2) = [(I - Pi)Qi]i,i.[(I - P2)Q2]i,i- 
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In the same way we prove that: 



(Ao)i, 2 = (1 -Pi(l))9i,i(l)(l “Pi(2))9i,2(2) 

= [(I-Pi)Qi]i,i[(I-P2)Q2]i,2 

(Ao)i,3 = (1-Pi(1))9i,2(1)(1-Pi(2))<?i,i(2) 

= [(I-Pi)Qi]i,2.[(I-P2)Q2]i.1 

(Ao)i,4 = (1 - Pi(1))9i,2(1)(1 - Pi(2))gi,2(2) 

= [(I-Pi)Qi]i,2.[(I-P2)Q2]i,2 

(Ao)2,1 = (1 -Pl(l))9l,l(l)(l -P2(2))Q2 ,i(2) 

= [(I-Pi)Qi]i,i-[(I-P2)Q2]2,1 

(Ao)2,1 = (1 -Pl(l))9l,l(l)(l ~P2(2))92,2(2) 

= [(I-Pi)Qi]i,1-[(I-P2)Q2]2,2 

(Ao)2,1 = (1 - Pl(l))9l,2(l)(l - P2(2))92 ,i(2) 

= [(I-Pi)Qi]i,2.[(I-P2)Q2]2,1 

(Ao)2,1 = (1-Pl(l))gi,2(l)(l-P2(2))92,2(2) 

= [(I-Pi)Qi]i,2.[(I-P2)Q2]2,2 

This can be repeated for the third and fourth row of Aq and for Ai and A 2 . 
This proves that in Kronecker product notation Ai can be written as: 

( A 2 = PiQi(8)P2Q2 

< Ai = (I — Pi)Qi <8) P 2 Q 2 + PiQi (I - P 2 )Q 2 
[ Ao = (I-Pi)Qi^(I-P 2 )Q 2 

Thus the Kronecker product expresses that the Markov process remembers 
the state of every individual source during a slot; this requires 4 states for the 
case of two 2-state sources. 



3 SPECTRAL DECOMPOSITION OF THE RATE MATRICES 

The stationary probability distribution of P is the infinite row vector tt = 
[ZLo ^1 •••] where every is a row vector with dimension (corresponding 
to level b); assuming the average arrival rate ^ < 1, tt is the unique solution 

of 



{ 



7T.P = TT 
E~o2Li.e = l 



( 3 ) 
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In (Wuyts and Boel 1996) it is shown that tt has the following generalized 
matrix geometric structure: 

Kn =7Ln-lRl +7T^_2i2;8 + ... + 7T„_iV+i Tl = AT - 1, iV, ... (4) 

where the rate matrices Ri , R 2 , Rn-i are uniquely defined as minimal 
nonnegative matrices which satisfy: 

f Rn - 1 = Aj^ R n-iAi -{• Rn- 1 Ri-^o /K\ 

\ Ri = Aj+i + RiAi + [RiRi + i^/+l]Ao if 1 < / < - 2 

By collecting blocks of (A — 1) x (AT — 1) matrices Ai (starting at the top left 
corner of P) and naming these blocks B', Aq, A'l and A 2 , one can reduce 
the structure of P to a quasi-birth-and-death process as in (Neuts 1981). The 
equilibrium distribution tt = [ttJ) tt^ tt^ ...] where the (AT — l)M^-dimensional 
row vector ttJ = [7Li(N-i) ••• ^i(N-i)+N- 2 ] ^e written in matrix 

geometric form (Neuts 1981): 

7r!r, = 7^R^ n = 0,l,... (6) 

where R is the {N — 1)M^ x {N — l)M^-dimensional rate matrix which is 
the minimum nonnegative solution of the following set of nonlinear equations: 

= A '2 + RK + (7) 

In both cases tt^ = [ttq iLi ... 7Ln-2\ follows from the boundary equations 
tt^B' + tt'i Aq = iLo and the normalization equation — 1* 

The main limitation of the previous methods is that for values of N and 
M which would model a realistic ATM multiplexer the number of equations 
in (5) or (7) becomes prohibitive, even though (5) and (7) can be solved 
iteratively. N = 20 video sources each with a 10-dimensional modulating state 
is a realistic example, but gives 10^® nonlinear equations to be solved! The 
main contribution of this paper consists in showing that by using the internal 
Kronecker product structure (2) of the A-matrices, a relation between the R{ 
in (5) and the il-matrix in (7) and by using the eigenvalue decomposition 
for R, it is possible to reduce (5) or (7) to A" + 1 nonlinear equations in an 
eigenvalue of R and N dummy variables. These equations are the same for 
all eigenvalues. 

The relationship between the 2 methods above is shown by the following 
equality: 

R = (8) 
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where R' is a matrix with the same dimensions as 12, defined as 



R' = 



( ^ 

I 

0 

0 



0 0 
0 0 
I 0 
0 I 



0 Rn-1 ^ 
0 Rn-2 
0 Rn-3 
0 Rn-4 



( 9 ) 



\ 0 0 0 ... I Ri J 

From (8) it follows that R and R’ have the same eigenvectors and that the 
eigenvalues of R are the {N - l)-th power of the eigenvalues of 12'. To keep 
the notation simple, we assume that 12' is diagonalisable. In the general case 
Jordan forms have to be considered leading to very tedious calculations. 

Substituting (9) in the eigenvalue equation vR' = sv shows that each left 
eigenvector v (corresponding to eigenvalues Si, i = 1, — 1)M^) has the 

following structure: 



v = [wsw s^w ... 



(10) 



where the M^-dimensional vectors w satisfies 



ui[Rn—i H" sRn-2 H" ... + Ri] — (11) 

Moreover the eigenvalues Si are the {N - 1)M^ roots of the polynomial equa- 
tion 



det[RN—i “h sRn—2 4- ... + ^ Ri ~~ ^I] — 0 (12) 

Unfortunately we cannot use (11) and (12) to solve the eigenvalue problem, 
since the rate matrices Ri are unknown. Left multiplying (7), with the left 
eigenvector v expressed via (10), we find that w must satisfy 

wX(s) = 2^[An + ^An-i + 5^An- 2 4- ... + ^Ai H- 5 '^Aq] = (13) 

A necessary condition for s to be an eigenvalue is that det(K{s) — s^~^T) = 0. 
This characteristic equation is a polynomial of degree between {N — 1)M^ 
and depending on the rank of Aq. This means that the characteristic 

equation can have more than {N - 1)M^ roots, but we can show that there 
are exactly (N - 1)M^ solutions of det{X{s) - = 0 which satisfy 

0 < |s| < 1; these are the {N - 1)M^ eigenvalues of R. This can be proven 
as follows: in (Neuts 1981) it is shown that all {N - 1)M^ eigenvalues of R 
lie inside the unit circle, as well as all eigenvalues of 12 / 3 , defined in 
section 6 of this paper and in (Wuyts and Boel 1996). If the average load of 
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the buffer is less than 1, exactly one eigenvalue of Rf^ will be 1, the others 
lie strictly inside the unit circle. In section 6 it is also shown that the finite 
inverses of the eigenvalues of satisfy the same characteristic equation as 
the eigenvalues of R. We say only the finite inverses, because some of the 
eigenvalues of R^ (let us say k in total) might be 0 (because the rank of Aq 
equals - k). This means that det{X{s) - = 0 has {N - 1)M^ 

roots inside the unit circle (i.e. the eigenvalues of R) and - k finite roots 
outside the unit circle (i.e. the inverse of the eigenvalues of jR/j which are 
different from 0), including the root at 1. 

Because of the Kronecker product structure of Ai in (2), X(s) can be decom- 
posed as X{s) = Qi(s)0Q2(5)(8)...0Qn( 5) where Qi(5) = (PiH-s(I-Pi))Qi. 
Let us assume that w also has a Kronecker product structure, then a sufficient 
condition for w and s to satisfy (13) is 

^iQi{s) = fiiXi 2 = 1,...,A^ (14) 

( 15 ) 

2=1 

(g) ^2 0) ••• 0 (16) 

Notice that is an M-dimensional vector and every in is a dummy variable, 
used only in order to decompose one large set (13) of linear equations in 
w into N smaller sets (14) of M linear equations in x^. The solution for s and 
IJLi is given by the condition that the solution for each x^ must differ from 0. 
This yields the following set of N -\-l nonlinear equations: 



det(Qi(s) — /Ujl) = 0 i = 1, iV 


(17) 


N 


n w = ^ 
2=1 


(18) 



The first N equations (17) are decoupled per source, since each equation only 
contains the variable s and 1 variable fii. So the set is only coupled through 
the last equation (18). 



4 THE EQUILIBRIUM DISTRIBUTION 

If R is diagonalisable, its set of left eigenvalues (as given by (10)) is a base 
for the {N - l)M^-dimensional space. So also can be written as a linear 
combination of these eigenvectors and this yields a general expression for 
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the equilibrium distribution tt: 



(W-l)M" 

2L„ = 52 n = 0, 1, 2, ... 

i=l 



(19) 



The unknown scalars bi are uniquely determined by the boundary equations 
7 t{)B' + tt^Aq = TTo and the normalization equation ~ Substitu- 

tion of (19) in both equations yields 



6WU = 0 


(20) 


(AT-l)Af" , 


E 1 = i 

1-Si ‘ 


(21) 



and W, U and b are given by 



b = [ 6 i 62 
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0 
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> 

1 
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0 


0 


0 Aq 



(22) 

(23) 



(24) 



Notice that the computation of 6 still involves solving a set of linear 

equations in the same amount of variables. For now we suggest to solve this 
set using the classical methods, such as LU-decomposition, but the structure 
of W and U suggests that there exist more efficient methods, allowing us 
to compute equilibrium distributions for larger N and M. Research on this 
subject is in progress. 



5 A B-ISDN EXAMPLE : COMPUTATION OF THE 
EIGENVALUES 

If the eigenvalues Si and the corresponding dummy variables are known, the 
left eigenvectors Wi can be easily computed in (14) and (16), and also all b{ fol- 
low from a set of linear equations (20)-(21). Only the numerical computation 
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of the eigenvalues needs further explanation, since the equations (17)-(18) are 
not linear. 

Let us first assume that the transition matrix Qi of the modulating Markov 
process for source i has the following structure: 



Qi = 



W) 

Pzii) 



,(i) 



« 2(0 

1 - /02(O 
0 



«3(0 «4(0 

0 0 

l-/33(t) 0 



0 

0 






(25) 



V /^m(0 



0 1-/3m(0 / 



It is shown in (Robert and Le Boudec ‘‘Can...” 1995) and (Robert and Le 
Boudec “Stock...” 1995) that this assumption provides a good model for VBR 
data sources on ATM networks such that long range dependence of these 
sources is approximated very well over several orders of magnitude of time. 
The time interval over which the long range dependent behaviour of this model 
is valid can be increased by increasing the number M of states per source. Also 
in the EXPERT-project experiments have shown that the assumed structure 
for Qi gives a good model for VBR video over ATM (ACTS 1997). It has been 
observed that cell losses depend strongly on correlations between cell arrivals 
during bursts of high rates. These bursts are generated by states j such that 
aj{i) and /3j{i) are small and the diagonal elements of Qi close to 1. 

Another advantage of (25) is that the determinant of (17) is easy to com- 
pute: 



MM M 



dei(Qi(s) 


- ^lil) = ai(i)TTc*(i) 


-[^aj{i)bj{i) 




(26) 






k=2 


3=2 


k=2,k^j 




cjfe(i) = 


(pfc(i) + s(l 




h{i))-Pi k = 




(27) 


bj{i) = 


(pj{i) + s(l 


- Pi(i)))0j{i) 


cL 

II 




(28) 


aj{i) = 


(pi(i) + s(l 


- Pi(i)))o‘j{i] 


1 j = 2,...,M 




(29) 








M 






ai{i) = 


(pi(i) + s(l 


-Pi (*)))(!- 






(30) 



j=2 



where Pk{i) is the k-th diagonal element of Pi. It is obvious that the deter- 
minant in (17) is a polynomial of degree M in fii as well as in s. We want to 
emphasize here that the structure in (25) is not a necessary condition for the 
validity of the method described in this paper. The method is still valid for 
a general transition matrix Qi. The only necessary condition for the validity 
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of (31)-(32) is the Kronecker product structure in (2), in other words: the ar- 
rival stream must consist of independent Markovian processes. The structure 
in (25) is only chosen to simplify the computation of the determinant in (31) 
and because this structure is known to occur in practical situations. 

The numerical method for the computation of s is based upon repeated 
substitution. The initial step is =0. The iterative algorithm is given by 

= 0 i = l,...,N ( 31 ) 

(,(*«))"-■ = ( 32 ) 

i=l 

Since (31) is a polynomial of degree M in we see that is not 

a function of unless we order the M roots of the polynomial (e.g. with 
increasing real part), give these roots indices from 1 to M and during each 
iteration choose the root with the same index. We do this for all N equations 
and substitute the solutions in the right member of (32). Again is not a 
function of the product in the right member of (32), since the {N — l)th root 
has iV— 1 (complex) solutions. Also here we can order the roots with increasing 
angles with respect to the real axis in the complex plane, give indices from 1 
to N — 1 and again choose the root with the same index during every iteration 
step. Notice that the N polynomial equations in (31) all have M roots such 
that the number of combinations in the product in (32) equals . For each 
of these combinations there are N — 1 roots s, so we find {N — 1)M^ 
solutions for s, which is exactly the number of solutions for s within the unit 
circle. 

Although the iterative algorithm converges to the correct solution in most 
cases, it can get caught in a periodic cycle in some cases. During tests we no- 
ticed that this periodic cycle in the case of nonconvergence always passed the 
neighbourhood of the correct solution. This suggests the following extension 
of the iterative method with a minimization algorithm. Let /Jii^i{s) denote the 
solution jjLi of the ith equation in (17) with the Ith largest real part, such that 
jjLi^i{s) is a well defined function of s. In that case we can define the following 
squared error function: 

(33) 

i=l 

It is obvious that the minima of e^(s), which equal 0, are reached at the so- 
lutions s of (17)-(18). If the iterative method (31)-(32) does not converge, it 
is possible to compute e^(s) during every iteration step and find the neigh- 
bourhood of the correct root by choosing the minimum in the periodic cycle. 
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It has been found experimentally that Powell’s Direction Set method (Press 
et al 1989) always finds the correct minimum. 

The following table shows the eigenvalues Si for iV = 4 identical sources with 
M = 2 states per source (on/off-sources) with 0:2 =0.04, /?2=0.01, pi=0.95 and 
P2=0.01 such that the average load p=0.792. 



eigenvalues 


multiplicity 


convergence 


0.003007288 
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0.2067359 
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0.8951560 
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1 


y 


-0.004877811-j0.002913184 


4 


y 


-0.009916731-j0.001 100736 


6 


n 
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Since we consider identical sources, there are only 15 different eigenvalues, 
instead of {N - 1)M^ = 48, but some of them have a multiplicity larger than 
1. The matrix JR' is still diagonalizable, because every eigenvalue has as many 
different eigenvectors as its multiplicity. The reason can be found in (14) and 
(16) : the iV = 4 equations in (14) are now identical but depending on which 
of the roots fii we choose to substitute in the product in (15), the eigenvectors 

can still be different. Taking permutations of the Kronecker product in (16) 
gives different vectors w corresponding to the same eigenvalue s. The number 
of these permutations is the multiplicity of s and equals for M = 2; / (or 
iV — /) is the number of variables /jli which have the same value in the product 
of (15). 

In the table we also pointed out for which eigenvalues the repeated substi- 
tution does not converge. Notice that nonconvergence only occurs for 3 of the 
15 eigenvalues which are very close to each other, such that the squared error 
function has very steep slopes in the neighbourhood of these values. Neverthe- 
less Powell’s Direction Set method has no problem converging to the correct 
solutions after using repeated substitution to come in their neighbourhood. 

Figures 1 and 2 show the stationary distribution for the buffer occupancy 
(as a measure for the delay characteristics) and the CLR for the same example 
as discussed above. The figures compare the results for the finite and infinite 



166 




buffer case. In the infinite buffer case the figures show several curves, taking 
into account different numbers of eigenvalues. It can be seen that reliable 
estimates of the CLR can be obtained using only a fraction of the eigenvalues 
for the computation of the results, neglecting the other eigenvalues as if they 
were 0. The curves taking into account all eigenvalues show the exact results. 
The other curves take into account only the positive real eigenvalues (the 
computation of these is always stable) or only 1 eigenvalue (the maximum 
positive real eigenvalue). Particularly the curves for the CLR are close to 
each other, which implies that not all eigenvalues should be computed. 



b (number of cells in the buffer) 

0 20 40 60 80 100 120 140 




Figure 1 The probability P(b) that there are b cells in the buffer as a func- 
tion of b for N=4 identical sources with M=2 states per source and B=128; 
a2=0.04, /?2=0.01, pon=0.95 and po//=0.01, p=0.792. 



B (buffer size) 




Figure 2 CLR as a function of buflFer size B for N=4 identical sources with 
M=2 states per source; a2=0.04, /?2=0.01, po„=0.95 and po//=0.01, p=0.792. 



Current research involves finding heuristic rules for deciding which eigenvalues 
to compute. In any case it is obvious that good approximates can be obtained 
using a small number of eigenvalues, but that moreover a significant improve- 



167 






ment (figures 1 and 2 have a logarithmic scale) can be achieved by using more 
than one eigenvalue. 



6 EXTENSION OF THE ALGORITHM TO THE FINITE 
BUFFER CASE 

In (Wuyts and Boel 1996) it is shown that the stationary probability distribu- 
tion for a finite buffer is the superposition of 2 waves: . Indeed 

in the finite buffer case (3) can be interpreted as a linear difference equation 
with boundary conditions at the left and the right boundary. The general so- 
lution can be written as a superposition of the responses to respectively the 
left and the right boundary conditions. A physical analogon is as follows. The 
first wave propagates from left to right and carries the effects of an empty 
buffer, while Pn propagates from right to left and carries the effects of a full 
buffer. The first term is created by the same equations (4) and (5) as for tt^ 
in the infinite buffer case. This means that the spectral analysis for is the 
same as described above. The second term is given by where 

B is the length of the finite buffer. The rate matrix Rp fulfills the following 
equation: 



Rp — Ao H- RpA.\ -f Rp^ A.2 H- ... + Rp^ A.s (34) 

This means that the eigenvectors Wp oI the rate matrix Rp must satisfy 
WpSp'K{l/sp) = spwp such that the corresponding eigenvalues sp can be 
found as in (17)-(18), with substitution of Qi{s) in (17) by {spFi + 1 - Pi)Qi 
and with substitution of in (18) by sp. 

If the average load p is less than 1, one of the solutions sp equals 1. The other 
eigenvalues lie inside the unit circle, and when constructed correctly they must 
equal the inverse 1/s of the solutions of det(K{s) — s^~^T) = 0 which lie on 
or outside the unit circle. Nevertheless when using repeated substitution the 
convergence will be better when looking for solutions within the unit circle, 
and therefore the equations within section 6 should be used instead of the 
equations in the previous sections. 



7 CONCLUSION 

The main contribution of this paper is to reduce further the computational 
burden for a technique, introduced in (Wuyts and Boel 1996), to compute 
the CLR and delay characteristics for realistic VBR arrival traffic in an ATM 
buffer. In (Wuyts and Boel 1996) the algorithm involves the iterative com- 
putation of A — 1 rate matrices of dimensions x (for an infinite 
buffer). The algorithm introduced in this paper computes the {N — 1)M^ 
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eigenvalues and the corresponding left eigenvectors of 1 larger matrix, con- 
taining all these rate matrices, without having to compute this matrix itself. 
All the eigenvalues can be obtained by finding all solutions of a set of A" H- 1 
nonlinear equations in s and in N dummy variables. The first N equations 
are decoupled per source in the sense that they contain only s and 1 dummy 
variable. The last equation contains all + 1 variables. 

A further reduction of the dimensions might be obtained by computing only 
a fraction of the {N — 1)M^ terms in (19). When computing e.g. the CLR we 
are only interested in the values of the equilibrium distribution close to a full 
buffer. Many of the terms in (19) are usually so small in the full buffer region 
that they can be neglected. This happens e.g. for the terms corresponding 
to eigenvalues s with a small amplitude. During the iteration step (31) we 
have to choose which of the solutions of we have to substitute in (32). By 
choosing the solutions with the largest amplitudes we find the eigenvalue with 
the largest amplitude, etc. This means that we only have to compute a fraction 
of the eigenvalues. Unfortunately, depending on the length of the buffer some 
of the small eigenvalues might not be negligible because the corresponding 
values of bi in (19) might be large such that the corresponding terms are 
still substantial in the full buffer region. The selection of which eigenvalues to 
compute is still a topic of current research. 
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Abstract 

We examine an ATM multiplexer that is able to support two classes 
of service having qualities specified at the cell level. The first is a Guar- 
anteed Bandwidth (GB) service which ensures very small cell loss proba- 
bilities and cell delays. The amount of statistical multiplexing permitted 
for such connections is therefore quite limited. The second is one suited 
to non-real-time Variable Bit Rate (VBRnr) sources where much longer 
delays can be tolerated. In this case, larger buffers and higher levels 
of statistical multiplexing can be exploited to increase the utilization 
of the output link. To accommodate these differences, we consider a 
multiplexer with two buffers, one for each service class, where the GB 
connections have service priority. Specifically, the low-priority VBRnr 
traffic is served only when there are no cells in the buffer for the GB 
connections. Both types of traffic are modeled by On/Off sources. The 
principal aim of the investigation is to analyze the steady-state behav- 
ior of the low-priority buffer with respect to two hypotheses concerning 
the guaranteed traffic. The first presumes that the GB connections are 
allocated according to their peak bit rates; in this case, the analysis is 
exact. The second hypothesis permits partial statistical multiplexing 
of the GB traffic; here, the exact method is extended so as to provide 
an approximate solution. A simpler model is then considered for this 
purpose, which suffices if the low-priority buffer is sufficiently large. In 
conclusion, some attention is devoted to the transient behavior of this 
buffer, where we find that congestion probabilities can exceed those 
experienced under steady-state conditions. 




Keywords: ATM, multiplexer, service priority, analysis 



1. INTRODUCTION 

Broadband multiservice networks are intended to provide a variety of 
voice, data, and video services, where their corresponding requirements 
can differ considerably. For example, the loss of an ATM cell is quite 
acceptable for a voice connection while it may cause the retransmission 
of a large quantity of cells in the case of a data transfer. Accordingly, the 
ability of a network to discriminate between cells belonging to different 
service classes is advantageous, permitting its adaptation to class-specific 
needs. In particular, this can avoid inefficiencies associated with serv- 
ing connections uniformly according to the most stringent requirements, 
thereby increasing network utilization. 

Generally, the extent to which a common broadband transmission 
medium can support multiple services depends essentially on perfor- 
mance considerations. During the past decade, a great deal of attention 
has been devoted to characterizing such services in terms of quality re- 
quirements and, in turn, developing efficient means of integration that 
can guarantee specified service qualities (see Gopal et ah, 1992; Bonomi 
and Fendick, 1995, for example). More sophisticated queueing disci- 
plines, as an alternative to traditional FIFO scheduling, have also been 
investigated for this purpose, particularly those involving some form of 
priority mechanism F. Bonomi et ah, 1990; Kroner et ah, 1991; Zhang, 
1993; Meyer et ah, 1993. 

One of the important consequences of this effort has been the stan- 
dardization of a non-real-time Variable Bit Rate (VBRnr) service ATM 
Forum, 1996. This service class, originally developed for data transfer, 
has very strict loss-probability requirements but has no constraints with 
regard to transmission delays. Although the latter permits large buffers 
and good bandwidth utilization, if network resources are to be shared 
with Guaranteed Bandwidth (GB) services (those with strict delay re- 
quirements), the GB connections require special treatment. An obvi- 
ous solution in this regard is a service priority mechanism that favors 
GB sources without compromising the loss requirements for the VBRnr 
sources. Specifically, the investigation that follows considers the use of 
dedicated buffers for this purpose, one for each service class, where the 
VBRnr traffic is served only when there are no cells in the buffer for the 
GB connections. 

Section 2 examines an architecture of this kind, with the assumption 
of a single server (the channel) and two finite- capacity buffers. As sug- 
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gested by the above remarks, the capacity of the low-priority (VBRnr) 
buffer will be typically be much larger than that employed for the high- 
priority GB traffic. The analysis concerns the occupancy distribution 
of the low-priority buffer with respect to two alternative assumptions 
concerning the allocation of high-priority sources. In the first case we 
assume that there is no statistical multiplexing of the high-priority GB 
traffic, i.e., these connections are admitted until the sum of their peak 
bit rates is equal to the channel capacity. Under the second hypothesis, 
we allow partial statistical multiplexing of the GB traffic but require that 
the congestion probability at the burst level be less than a target loss 
probability of 10“^^ (see Rasmussen et al., 1991, for example). Solution 
algorithms, based on an appropriate stochastic model, are then devel- 
oped for each hypothesis, where the first admits to an exact analysis and 
the second to an approximate solution. 

Results of applying these algorithms are then presented in Section 3, 
indicating the effects of various On/ Off traffic assumptions on the dis- 
tribution of the VBRnr buffer. In the case of the approximate solution 
method, comparisons are made with simulation data. Comparisons are 
also made with distributions obtained from a multiplexer without pri- 
ority for both classes of traffic. The results show that, as the capacity 
of the VBRnr buffer increases, these distributions approach those given 
by the algorithms of Section 2. Accordingly, if the VBRnr buffer is 
sufficiently large, one can use this simpler approach to approximate its 
distribution. In particular, this permits the use of approximate methods 
such as those described in Sohraby, 1992; Acampora and Zhang, 1992. 
Section 3 concludes with brief study of the transient behavior of the 
low-priority buffer, where we find that congestion probabilities can ex- 
ceed those experienced under steady-state conditions. The final section 
(Section 4) then summarizes the approach and main results of the entire 
investigation. 

2. MATHEMATICAL MODEL AND 
ANALYSIS 

Consider the system described in Figure 9.1, where buffers Bq and Bi 
have finite capacities Qo and Qi, respectively, and are served by a com- 
mon channel with capacity C. Buffer Bq is reserved for GB connections 
which have very low delay requirements and have priority in accessing 
the channel relative to cells that enter B\. The latter are presumed to 
derive from VBRnr connections, which have no delay requirements and 
are served only when buffer Bq is empty. In keeping with the notation 



173 




for the buffers, we refer to the GB and VBRnr traffic as being type 0 
and type 1, respectively. The service discipline for both buffers is FIFO. 




Figure 9.1 Service priority multiplexer. 

More precisely, traffic of type j (j = 0, 1) is assumed to be a set of Nj 
homogeneous On/ Off sources, where each individual source is modeled 
by a 2-state Markov process. If a type-j source is active (the On state), 
cells arrive periodically during time slots that are separated by some 
specified integer > 1, where the values of Rq and Ri may differ. In 
other words, an active type-j source transmits cells at a peak rate equal 
to C/Rj. If a source of either type is idle (the Off state) then no cells are 
transmitted. Accordingly, if we let T = {0,1,2,...} denote the model’s 
(discrete) time base, where a time instant t £ T \s interpreted as ^th 
time slot, then cells from a type-j source can arrive only during slots 
t that are non-negative integer multiples of Rj. Hence, for each traffic 
class, all Nj of the type-j connections are synchronized in the sense that 
cell arrivals and source-state transitions can occur only at times 

t — sRj^ s G {0, 1,2,...}. (9.1) 

Although alternative assumptions could made with regard to the rel- 
ative phasing of sources (including the possibility of random phasing), 
the justification of any such assumption is not obvious. On the other 
hand, tight synchronization of the kind assumed above, although not re- 
alistic from a practical viewpoint, results in worst-case buffer congestion 
and cell delays. Our subsequent analysis is therefore conservative and, 
in this sense, the constraints imposed by (9.1) appear to be reasonable. 

Due to the presumed On/ Off nature of individual sources, the number 
of consecutive periods (each of length Rj) during which a type-j source 
remains active is geometrically distributed with mean Lj. If further, 
we let Ij denote the mean value of the Off time (which likewise has a 
geometric distribution and, again, is quantified in units equal to Rj) 
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then the activity pj of a type-j source is determined by the relation 
pj = Ljl{Ij + Lj). Alternatively, if we choose to specify pj and Lj 
then the mean Off time Ij is uniquely determined. The latter choice, 
together with our synchronization assumption, implies that the type-j 
traffic entering buffer Bj (j = 0, 1) is completely characterized by the 
values of four parameters, namely Aj, Rj^ pj^ and Lj. 

Because of the stringent delay requirement for type-0 (GB) traffic, 
the queueing of such cells must be kept to a minimum. This requires a 
specific control on the number Nq of connections permitted for this tralfic 
class. Accordingly, the investigation that follows considers each of two 
alternative hypotheses regarding the allocation of type-0 connections. 

HI: Peak Allocation. In this case, the number of high-priority con- 
nections is restricted such that it does not exceed the period i?o, 
i.e.. No < Rq. Since the sources are synchronized (9.1), at each 
potential arrival time sRq^ there are at most Nq cell arrivals of 
type 0. Hence, one can easily verify the following observation. If 
the capacity of buffer Bq is at least Nq then the number of cells qt 
in Bq at the end of slot /, given that m type-0 sources are active, 
is 

qt = max{m — (t mod J?o), 0} . (9.2) 

Taking the capacity of Bq to be equal to the number of type-0 
sources (i.e.,(3o = Aq), the above implies that qt < Qo^ for all 
t £ T. Hence, no high-priority cells are lost under hypothesis HI. 

H2: Cell- level Statistical Multiplexing. In this case, we suppose 
that Ao > Rq while imposing the following restriction. Let Xq 
be the random variable whose value is the number of active type- 
0 sources under steady-state conditions. Assuming further that 
Qo = i?o, we then want to insure that P[Xq > Rq] < P\oss^ where 
Pioss is the target loss probability. The value of P[Xq > Rq] that 
satisfies this requirement can be determined from the steady-state 
distribution of the (composite) source model, where this value in- 
creases as Ao becomes larger. 

Under either of the above hypotheses, we note that the maximum 
value of Ao does not depend on Lq. Moreover, the occupancy distribu- 
tion of buffer Bq is independent of the low-priority (type-1) traffic and, 
hence, is easily obtained. Accordingly, the more interesting distribution 
is that of buffer Pi, since it can reveal the advantages of statistical mul- 
tiplexing with respect to the remaining available bandwidth. In what 
follows, we show how this distribution may be evaluated without having 
to account for the behavior of Bq. As mentioned in Section 1, under 
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hypothesis HI, this analysis is exact; if H2 is presumed, the resulting 
solution is approximate. 

For these purposes, we define a finite-state stochastic process with 
respect to the time base T = {0,1,2,...} where, as noted earlier, an 
element t £ T is interpreted as the time of the /th slot. As such, these 
times also correspond to transmission instants on the output link. More 
precisely, unless both buffers are empty at the end of slot ^ — 1, a cell 
is transmitted at the beginning of slot t. Following such a departure (if 
it exists), a type j cell may arrive depending on (9.1) and whether a 
source of that type is active. The buffer occupancies are then observed 
at the end of the time slot. With these assumptions, a state of process 
can be described by a 3-tuple (g, m,n), where q is the number of cells 
in 5i, m is the number of active type-0 sources, and n is the number 
of active type-1 sources. Accordingly, 0 < q < Qi^ 0 < m < Nq, and 
0 < n < TVi. To describe the probabilistic nature of this process, for 
^ G T, we let pt{q, Tn, n) denote the probability of being in state (g, m, n) 
at time t. The dependence of this probability on /, even when t is large, 
is due to the fact that type-j sources can change state only at times 
given by (9.1). Hence, except in the degenerate case where Rj = 1 for 
both traffic classes, these times constitute a proper subset of T. 

To determine the values of it is also necessary to know whether 
the server is able to accept a low-priority (type-1) service at time t. 
Because the occupancy of the high-priority buffer Bq is not included as 
part of the model’s state, such information must be inferred from the 
type-0 source states. In the case of peak allocation (hypothesis HI), this 
is possible; indeed, in a given slot t with the system in state (g,m,n), 
the server can transmit a low-priority cell if and only if Bq was empty 
at the end of the previous slot, i.e., (t — 1) mod Rq > m', where m' is 
the number of active type-0 sources at time / — 1. On the other hand, 
if even a limited amount of statistical multiplexing is used to allocate 
the GB connections (hypothesis H2), this kind of inference is no longer 
possible. Nevertheless, via an approximate solution method that involves 
a modified type-0 source model, we find that the same algorithm can be 
applied. 

To describe this algorithm, let Aj denote the transition matrix for the 
active sources of type j (j = 0, 1) and let aj(k^l) denote its k^l entry. In 
other words, aj(k^i) is the conditional probability that i type-j sources 
are active at a potential arrival time sRj^ given that k were active at 
the previous potential arrival time {s - l)Rj (where 0 < k^i < Nj and 
5 = 1,2, 3, . . .). Since sources of a given type are synchronized, the for- 
mulas for these probabilities are identical to those derived for successive- 
slot transitions of combined On/Off sources (see Bonomi et ah, 1992, for 
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example). Accordingly, under hypothesis HI, the state probability at 
time t can be expressed recursively as a function of the state probabili- 
ties at time t — 1. More precisely, if we let c(t) — [t — 1) mod Rq then, 
for g < Qi, we have the following case-by-case formulations of the state- 
occupancy probabilities at time t. 



Case 1: t is an integer multiple of neither Rq nor R\, In this case, there 

are no arrivals and no change in the number of active sources. If Bi 

is nonempty at time / — 1 and the channel is available, the low-priority 
buffer has one less cell at time t. Accordingly, value of pt(g,m,n), can 
be formulated in terms of the previous slot’s distribution as follows. 

C pt-i{q,m,n) if c{t) < m 

Pt{q, m,n) = < pt-i {q, m, n) -f- pt-i (q + 1, m, n) if c{t) > m and g = 0 

I Pt-i (g + 1, 'm, n) if c{t) > m and g > 0 



Case 2: ^ is an integer multiple of Rq but not of R\. Here, only GB 
sources can change state. The low-priority buffer Bi can transmit a cell 
only if the previous number of high-priority arrivals was less than Rq. 
Hence, the recursive formulation in this case is 



Pt{q, m, n) 



Rq-1 

Pt-i{q + 1, rn', n)ao{m', m) 

m'=0 

No 

-f pt-i{q,m' ,n)ao{m' ,m) if g > 0 

; m'-=Ro 
-Ro-1 

y^ 'rn‘ , n)ao(m', m) 

m' — Q 
No 

-f y^ pt-\ (0, m\ n)ao(m^, m) if g = 0 

'' m' =0 



where a sum is understood to have value 0 if the lower limit exceeds the 
upper limit. 



Case 3: t is an integer multiple of Ri but not of Rq. In this case, 
low-priority sources can change state and we can have a corresponding 
number of arrivals entering the low-priority buffer. 



Pt{q,m,n) 



< iVi 

(g — n, m, n')ai {n , n) if c[t) < m and q > n 

n*=0 
1 JVi 

< y^ "y^Pt-i{q\rn,n')ai{n^,n) if c(f) > m and g = n 

qf=Q n' = 0 

y^j?t-i (g — n -1- 1, m, n')ai(n^ n) if c{t) > m and q > n 

< n'=0 
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Case 4: t is an integer multiple of both Rq and Ri. This is the most 
complicated case since, during such a slot, cells may arrive at both buffers 
and source states may change for each type of traffic. 



-^1 

y ^ ^ ^ pt-i{q — n + 1, m', n)ao(m\ m)ai{n , n) 

m'=0 n' = 0 

No Ni 

+ ^ ^ Pt -1 {q — n,m' , n)ao{m', m)ai (n' , n) ii q > n 

m' = Ro n' —Q 



pt{q, m, n) 



1 Ro-l Ni 

^‘^Pt-i{q ,m\n)ao{m' ,m)ai{n ,n) 

q' =.Q m^=0 n' = 0 
No Ni 

+ ( 0 , m', n)ao{m', m)ai{n , n) 

m' = Ro n* =0 



if q = n 



if q < n 



The equations for a full buffer (g = Qi) are a natural extension of 
the above relationships. Regarding Cases 1 and 3, it should be noted 
that these are indeed possible, provided the period of the type-0 sources 
is non-trivial (Rq > 1). Therefore, due to the dependence on time (via 
c{t)) that exists in these cases, the resulting model is generally a non- 
homogeneous Markov process. Moreover, this process is periodic as a 
consequence of the source periods Rq and i?i, precluding convergence of 
Pt (as t — > oc) to a stationary distribution On the other hand, once t 
becomes sufficiently large, the distributions of pt will repeat periodically 
with a period equal to the least common multiple lcm(i?o, Ri) of periods 
Rq and Ri. These distributions can thus be obtained by computing the 
distributions for each t, starting from an arbitrary distribution at time 
^ 0, until their difference in two successive periods is negligible. More 

precisely, if we let i{t) = lcm(i?o, i?i), these computations are iterated 
until the equation 

I Pt{q,m,n) - p(^t){q,ni,n) [ ^ ^ 

Pt{q,m,n) 

is satisfied for each slot in the period and for each admissible state of 
the system. Typically, e is taken to be 10“^. 

Let us now consider the alternative allocation hypothesis H2 which 
permits a limited amount of statistical multiplexing for the high-priority 
GB sources. Here we find that, by introducing a modified representation 
of these sources, the above formulas can be applied to obtain highly 
accurate approximate solutions. This “equivalent” type-0 traffic model 
is constructed by reducing the number Nq of high-priority sources to 
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Rq and increasing the activity of each source such that the load on the 
output link remains the same. More precisely, given that Nq^ i?o, po^ 
and Lq are the parameters of the type-0 sources (as defined above), for 
the modified model we consider parameters Nq^ Rq^ Po, and Lq^ where 
the first three are given by the equations 

N' = Ro, R'o = Ro, and Po = ^ • (9-3) 

Accordingly, for given values of the original parameters, the link utiliza- 
tion due to type-0 sources is the same for the equivalent representation 
(since NqPq = NqPq) and, moreover, the new parameter values are such 
that hypothesis HI (peak allocation) is satisfied. 

Concerning Zq, two possibilities are considered. The first is to let 
Lq = Lq; the second is to assign Lq a value such that the asymptotic 
variance v (see Jacobsen et ah, 1990, for example) of the arrival process, 
defined by 

li^ VAR[iV(0..)] 

t^OO t 

where N{0^t) is the number of cells arriving in [0,t], remains the same 
as that of the original traffic. In the section that follows we find that, 
even under the most unfavorable load conditions, these choices lead to 
distributions for buffer Bi that are essentially identical. 

3. RESULTS 

We begin by examining results under hypothesis HI, i.e., the case 
where the GB sources require a bandwidth no greater than the channel 
capacity C — 150 Mbps. As noted earlier, under these conditions the 
algorithm yields an exact solution. Of initial interest is the possible in- 
fluence of GB- traffic burstiness, as reflected by the mean burst length 
To? on the occupancy distribution of the low-priority buffer. Some pre- 
liminary observations in this regard are the following. 

A) For GB (type-0) traffic, it follows from equation (9.2) that buffer 
Bq is always empty prior to the entry of type-0 cells. Accordingly, 
the occupancy distribution of Bq is independent of the mean burst 
length Lq and is equal to the distribution of the channel’s busy 
period (when serving GB traffic only). 

B) However, beyond a knowledge of this busy-period distribution (which 
can be obtained by analysis in the absence of VBRnr sources, i.e., 
Ai = 0), it is important to note that there is correlation between 
the durations of two consecutive busy periods. This can be seen 
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by observing the type-0 cell departure process for two consecutive 
intervals of length i?o, assuming that the busy period duration of 
the first interval is equal to I slots (1 < ^ < Nq). The probability 
that the busy period in the second interval will have a duration of 
m slots is then given by ao(^, m). This implies that, in general, the 
durations of consecutive busy periods are statistically dependent; 
moreover, the extent of this correlation is a function of the mean 
burst length Lq, Regarding the results in question, it is therefore 
reasonable to expect that the value of To will have an impact on 
the distribution of the low-priority buffer. 

This reasoning is borne out by Figures 9.2 and 9.3 which display 
the occupancy distribution of the low-priority buffer B\. The capacity 
assumed for this buffer is Q\ = 400 (note that these figures are truncated, 
ignoring occupancies which have extremely low probabilities). Recall 
that, under peak allocation (hypothesis HI), the capacity of the high- 
priority buffer Bq is assumed to coincide with the number of high priority 
connections, i.e., Qq = JVq; under hypothesis H2, we presume that Qo = 
Nq = iZo- With these choices, there are no type-0 cell losses under HI 
and negligible type-0 losses under H2. Specifically, the type-0 traffic 
assumed for Figure 9.2 consists of iVo = 3 GB sources, each with a peak 
rate of 30 Mbps (i.e., Rq = 5) and activity po = 0.5. For the sake of 
observing effects of burstiness, two values are considered for To, namely 
20 and 100. The low-priority traffic assumed for this figure is given 
by the parameter values N\ = 10, R\ — 15, p\ = 0.5, and L\ = 100. 
In Figure 9.3, a different mix of traffic is considered, with more GB 
sources {Nq = 5), fewer VBRnr sources (iVi = 4), and a lower activity 
for both classes (po = />i = 0.1). The periods are the same as for Figure 
9.2 {Rq = 5, iZi = 15) but the burstiness differs considerably, i.e., the 
comparison here is for values of To equal to 5 and 200, with an assumed 
mean burst length of L\ = 10 for the VBRnr traffic. 

In both figures, we see that the burstiness of the high-priority sources 
has an appreciable influence on the occupancy distribution of the low- 
priority buffer. Moreover, if the relative change in the value of To is 
increased (e.g., the 40x increase considered in Figure 9.3 compared with 
the 5x change for Figure 9.2), this influence becomes more severe. Ac- 
cordingly, these observations imply (assuming HI) that, although the 
distribution of buffer Bq does not depend on the mean burst length To 
of type-0 traffic, the value of To needs to be determined since it has 
an appreciable effect on the occupancy distribution of the low-priority 
buffer B\. 
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Figure 9.2 Influence of GB-source burstiness on buffer Bi; Lq = 20, 100. 
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Figure 9.3 Influence of GB-source burstiness on buffer Bi; Lo = 5, 200. 



Turning now to results of using the approximate algorithm under hy- 
pothesis H2, for these instances we choose a small value for po^ since it 
poses the worst condition for the approximation. This is due to the fact 
that, with higher loads, the number of sources that can be accommo- 
dated by the solution method proposed in Rasmussen et al., 1991 de- 
creases rapidly toward the number permitted using peak allocation. Ex- 
amining again the occupancy distribution of the low-priority buffer, the 
traffic parameter values assumed for Figure 9.4 are Nq = 45, Rq = 15, 
po = 0.05, and Lq = 100 for the high-priority traffic and Ni = 60, 
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Ri — 30, Pi = 0.075, and L\ = 200 for the low-priority sources. Apply- 
ing equations (9.3), the modified values for the first three type-0 traffic 
parameters are therefore Nq = 15, Rq — 15, and = 0.15. Two choices 
are considered for Tq according to the discussion at the end of Section 
2, i.e., Lq = 100 (equating it with Lq) and Lq = 111 (by equating 
variances). As can be seen from Figure 9.4, the resulting approximate 
distributions for either choice are essentially the same. It can also be ob- 
served that, for buffer occupancies that are moderately large (< 15 cells), 
the approximations agree almost exactly with a distribution obtained by 
simulating the exact model. For larger occupancies, the the approximate 
model unfortunately underestimates the probabilities obtained by sim- 
ulation. Note, however, that this occurs in the low-probability region 
(tail) of the distribution where simulation results themselves are more 
susceptible to errors of estimation. In Figure 9.5, we consider type-0 
sources with twice the mean burst length (when compared to Figure 
9.4); the remaining parameter values are the same. Again, the choice 
of Tq has very little effect on the solutions. Moreover, even with this 
amount of burstiness, the comparison with simulation data remains sim- 
ilar to what was observed in Figure 9.4. 
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Figure 9. 4 Validation of the approximate solution; Lq = 100. 



As noted earlier, algorithms such as these permit the evaluation of 
large buffers at the expense of long solution times. The latter is a se- 
rious problem, however, if such computations are to be part of a con- 
nection admission control (CAC) algorithm, calling for execution times 
in the order of tens of milliseconds. Prompted by this concern, we find 
(applying the solution algorithms described above) that the behavior 
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Figure 9.5 Validation of the approximate solution; Lo = 200. 



of the low-priority buffer is similar to that of a single-buffer multiplexer 
handling all the offered traffic (without prioritized service), provided the 
capacity of Bi is sufficiently large. Given this condition is satisfied, it 
is therefore possible to employ particularly fast approximations such as 
those proposed in Sohraby, 1992; Acampora and Zhang, 1992. 

To substantiate this claim, we first compare the low-priority buffer dis- 
tributions (exact vs. the simpler 1-buffer model) when the GB-required 
bandwidth is equal to the channel capacity, thus satisfying hypothesis 
HI. (Other assumptions stated in HI also apply here, e.g., the capacity 
Qo of buffer Bq is equal to the number Nq of type-0 connections.) For 
traffic parameter values Nq = 5, Eq = 5, po = 0.2, Lq = 50, N\ = 10, 
Ri = 15, Pi = 0.4, and L\ — 100, Figure 9.6 shows that if the capacity 
of buffer B\ is small, namely Qi — 20, then the VBRnr buffer behavior 
is not well represented by the simpler model. However, if the capacity 
of Bi is substantially larger, e.g., Qi = 400 as assumed in Figure 9.7, 
then for the same traffic parameter values of Figure 9.6, we see that the 
results derived from the one-buffer model agree quite closely with the 
results of the exact algorithm. 



The validity of this approach is likewise confirmed in Figure 9.8, where 
Nq ^ 45, Rq = 15, Po = 0.05, Lq ^ 200, Ni = 60 = 30, pi = 0.075, 

and L\ — 200. In this instance, the required bandwidth for the GB 
sources is greater than that of the channel capacity (hypothesis H2), 
i.e., the comparison here is with the approximate solution. 

Finally, we present a few results concerning the transient behavior of 
the low-priority buffer. Specifically, we are interested in how this buffer 
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Figure 9.6 Comparison with the approximate 1-bufFer model; Q\ =20. 
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Figure 9.1 Comparison with the approximate 1-buffer model; Qi = 400. 



reacts when another high-priority connection is added to the existing 
traffic, as motivated by the following concerns. 

1) An analysis of steady-state performance, as undertaken above, 
does not reveal what occurs just after a state-change in the (com- 
posite) traffic model. 

2) More precisely, when a type-0 source becomes active (On) at a 
certain time or, alternatively, when a new source is added in its 
active state at a certain time, the occupancy probability of the low- 
priority buffer gets conditioned by that event, resulting in transient 
behavior that deserves investigation. 
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Figure 9.8 Comparison with the approximate 1-bufFer model, assuming statistical 
allocation. 

3) Moreover, differences between transient and steady-state perfor- 
mance are likely to be more pronounced as the burstiness of the 
sources is increased. 

As the consequence of a brief study in this regard, we find that 3) 
indeed holds, where the sources assumed (prior to the transient phase) 
are as follows. 

High priority (Type 0): Nq = 4, i?o = 5, Lq = 50, 100, po = 0.33. 

Low priority (Type 1): iVi = 8, jRi = 15, L\ = 100, p\ = 0.75. 

The system in question is as depicted in Figure 9.1, i.e., there are sepa- 
rate buffers for type-0 and type-1 traffic. Given steady-state operation 
in the presence of the above-stated traffic, a fifth type-0 source (identical 
to the other four) is then added at some time where this source is 
presumed to be active (On) at the time of its addition. Figure 9.9 illus- 
trates the transient behavior of the low-priority buffer for this example, 
where time is measured relative to ta, i.e., r — t — with r G [0,4000]. 
The vertical axis is the congestion probability Pfuii(T), i.e., the proba- 
bility that buffer B\ is full just after the next type- 1-arrival slot that 
follows time r. We considered this value since it is easy to compute 
and is a good (quahtative) indicator of the type-1 cell loss probability. 
The two curves of Figure 9.9 correspond to the two choices of the mean 
burst length Lq indicated above, where we see that greater burstiness 
(To = 100 as compared with Lq = 50) implies a higher value of Pfuii(T) 
throughout the transient region. Moreover, as anticipated, these curves 
increase from the steady-state value at time to the new steady-state 
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value in the presence of five type-0 connections. However, with Lq = 50 
the curve is monotone increasing, but with Lq = 100 it is not. 




Figure 9.9 Transient congestion probability as a function of r. 

Since this difference is barely perceptible in Figure 9.9, it can be seen 
more clearly by considering the relative difference 

^ Pfuii(4000) - Pf„ii(r) 

^fuii(4000) 

and comparing the values for Lq = 50 and Lq — 100 (again for r G 
[0,4000]). This is done in Figure 9.10, where we observe that the curve 
for Lq = 100 is negative for approximately 2000 time slots, saying that 
Pfuwi^) > Tfuii(4000) during this period. This is the most important 
consequence of the analysis just described. Specifically, beyond what 
was anticipated in item 3) above, we find that, with sufficiently bursty 
sources, type-1 cell-loss probabilities during a transient period can ex- 
ceed those experienced under steady-state conditions. A more thorough 
investigation is required to assess the magnitude and impact of such 
differences for various traffic scenarios. 

4. SUMMARY 

This investigation has concerned an ATM multiplexer that is designed 
to accommodate both GB and VBRnr connections. Service is prioritized 
via buffers dedicated to each service class, where the low-priority VBRnr 
traffic is served only if there are no cells in the buffer for the GB connec- 
tions. The ensuing study then developed an appropriate model for the 
two-buffer system, followed by a steady-state analysis of how the bursti- 
ness of high-priority sources affects the occupancy of the low-priority 
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Figure 9.10 Relative difference in transient congestion probability. 



buffer under alternative traffic assumptions. The first presumes that the 
GB connections are allocated according to their peak bit rates; in this 
case, the analysis was exact. The second presumes partial statistical 
multiplexing of the GB traffic; here, the exact method was extended 
so as to provide an approximate solution. The results obtained in this 
regard reveal that the influence of GB-source burstiness is considerable, 
even with peak allocation. In cases where the low-priority buffer is suf- 
ficiently large (e.g., a capacity of 400), it was shown further that an 
approximate single-buffer model provides results that conform closely 
with those obtained by the earlier solution methods (for either alloca- 
tion hypothesis). Finally, returning to the two-buffer model, its transient 
behavior was examined in a region that follows the addition of an active 
high-priority source. Here it was found that, with sufficiently bursty GB 
traffic, the congestion probabilities for the low-priority buffer can exceed 
those experienced under steady-state conditions. 
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Abstract 

An analytical model for the performance analysis of a novel input access 
scheme for an ATM switch is developed and presented in this paper. The 
interconnection network of the ATM switch is internally nonblocking and is 
provided with iV input queues per each input port for a switch of size N x N. 
That is, each input port maintains a separate queue for each output port 
so as to reduce the head-of-line (HOL) blocking of conventional input queu- 
ing switches. Each input is allowed to send just one cell per slot time, and 
each output port is allowed to accept just one cell per slot time. Under sat- 
urated conditions the switch was analyzed and a closed-form solution for the 
maximum throughput is derived. Using a tagged input queue approach, an an- 
alytical model for evaluating the switch performance under an i.i.d. Bernoulli 
traffic for different offered traffic loads is developed. The switch throughput, 
mean cell delay, and cell loss probability are computed from the analytical 
model. The accuracy of the analytical model is verified using simulation. 
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ATM switch, analytical modeling, performance evaluation, computer simula- 
tion 



1 INTRODUCTION 

Input queueing is preferred in implementing switching architectures for ATM 
(Awdeh et al 1995) because of its simplicity. However, they suffer from the 
head- of -line (HOL) blocking problem which limits the throughput of each in- 
put port to a maximum of 58.6% under uniform traffic, and much lower for 
bursty traffic (Pattavina et al 1993). Several approaches have been proposed 




to overcome this problem: adopting a switch expansion, a windowing tech- 
nique, or a channel grouping technique (Awdeh et al 1995). Of particular in- 
terest to us in this paper is a recent technique termed parallel iterative match- 
ing (PIM) algorithm and its variants (Anderson et al 1993, McKeown 1994) 
which uses parallelism, randomness, and iteration to find a maximal matching 
between the inputs that have queued cells for transmission and the outputs 
that have queued cells (at the inputs) destined for them. Each input queue of 
the switch contains a random access buffer consisting of N FIFO queues, each 
of which stores the cells destined for one of the N output ports. The first cell 
in each queue can be selected for transmission across the switch in each time 
slot, with the following constraints: (i)Only one cell from any of the N queues 
in an input port can be transmitted in each time slot. (ii)At most one cell can 
be transmitted from the N input ports to an output port of the switch in any 
given time slot. 

To facilitate mathematical analysis, we modify the original PIM algorithm 
into a logically equivalent algorithm. The modified PIM algorithm iterates 
the following two steps until a maximal matching is found or until a fixed 
number of iterations are performed: 1. Each unmatched input chooses an 
output uniformly over all unmatched outputs for which it has queued cells 
and sends a request to it. 2. If an unmatched output receives any requests, it 
chooses one uniformly over all the requests and notifies each requesting input. 

The remainder of this paper is organized as follows. Section 2 presents 
recursive equations for the maximum throughput of the switch. Section 3 de- 
velops an analytical model based on the tagged queuing approach. Equations 
for computing interesting performance measures including throughput, mean 
cell delay, and mean cell loss probability are derived in this section. Numer- 
ical results obtained from the analytical model are presented for switches of 
different sizes in Section 4, and compared with the results from simulation. 
Finally conclusions are presented in Section 5. 



2 MAX THROUGHPUT OF MULTIPLE ITERATIONS PIM 

Under saturated conditions, all the queues at each input will have at least one 
cell so that each output will have requests from every unmatched input. An 
output selects one uniformly among the input requests. The throughput of the 
ATM switch with 1 iteration PIM scheduling, p(l), is equal to the probability 
that an output Oj gets matched after the first iteration. The probability of 
an input request being accepted by an output, p = 1/N. Then, 

p{l) = 1 - (1 - lim p(l) = 1 - e-i = 0.632. (1) 

iV N-^oo 

Let Pr{m(l)} and Pr{n(l)} respectively be the probabilities that m(l) 
inputs (outputs) get matched or remain unmatched and output Oj remains 
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unmatched after the first iteration. Then, Pr{n(l)} = Pr{m(l) = N — n(l)} 
and Pr{m(l)} = where is the Stirling num- 

ber of the second kind which gives the number of ways of partitioning a set 
of n elements into m non-empty subsets (Abramowitz et al 1972): = 

AEr=o(-ir-‘(X)‘" 

The throughput of two iterations PIM scheduling is equal to the sum of 
p(l) and the probability that output Oj gets matched in the second iteration, 
that is 

p{2) = p{l) -h Pr{Oj gets matched in the second iteration [ 

Oj wasn't matched in the first iteration} 

= p{l) + - (1 - ^)"(i))Fr{n(l)} 

n(l)=l ^ ’ 

JV-(i-l) 

Pr{n{i)} = ^2 Pr{m{i) — n{i - 1) - n{i)}Pr{n{i - 1)} (2) 

n(i— l)=n(i)+l 



where Pr{m{t)} = 
iterations PIM scheduling p{i) is 



Using Eq (2), the throughput of i 



p{i) =p{i-l)+ ^ (1 - (1 - 

n(i— 1)=1 



n{i — 1) 



)”(‘-i))Pr{n(i - 1)} 



Figure 1 shows the results for maximum throughput as function of switch 
size and number of iterations. As shown in this figure, the maximum through- 
put of a ATM switch with 1 iteration PIM scheduling converges to 0.63 (which 
corresponds to Eq (1)) when the switch size grows. Furthermore, the through- 
put increeises significantly after each iteration of PIM scheduling. Four iter- 
ations are sufficient for achieving maximum throughput of about 99% for a 
switch of any size. 



3 QUEUEING MODEL AND ANALYSIS OF MULTIPLE 
ITERATIONS PIM 

In this section, we model the ATM switch with PIM scheduling using queue- 
ing theory and analyze the underlying Markov chain. Our method uses the 
concept of tagged queues in modeling the PIM switch leading to a smaller 
state space. The concept of tagged input queue has been successfully used to 
evaluate the FIFO input-queued switch model (Pattavina et al 1993, Youn et 
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Figure 1 Maximum throughput as function of switch size and number of 
iterations. 



al 1994). These switches involve a single stage of contention resolution. On 
the other hand, for the switch with PIM scheduling, the contention resolution 
process consists of two stages. As observed from the algorithm descriptions 
of PIM, a HOL cell in an input queue will contend for transmission not only 
with the HOL cells of the same input, but also the HOL cells destined for 
the same output. As a result, the corresponding model is more complicated 
than for the FIFO input-queued switch. We make the following assumptions 
in developing the PIM switch model: 1. The switch operates synchronously. 
2. Every input queue has the same buffer size, namely 3. Cells arrive at 
every input queue according to an i.i.d. Bernoulli process with probability A. 
4. New cells arrive only at the beginning of the time slots, and cells depart 
only at the end of the time slots. 




Figure 2 An example of the queueing model for the PIM switch. 

Under the above assumptions, all the input queues will exhibit the same 
behavior when the system attains steady state. A queue at input i with output 
j as the destination is denoted by Q{i,j). Figure 2 shows an example of 
the queueing model for the PIM switch. In this example, the occupancy of 
Q(I, I) is taken as the tagged input queue, the number of HOL cells at input 
I is represented by the 1st HOL input queue, and the number of HOL cells 
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addressed for output 1 is denoted by the 1st HOL output queue. Both the 
HOL input queue and the HOL output queue are virtual queues which don’t 
exist in a real PIM switch but are useful for our mathematical analysis. 



3.1 Markov model 

Analyzing the queueing model of the PIM switch requires the construction of 
the underlying Markov chain Z. The states of the Markov chain Z are sampled 
at the end of the time slots and can be expressed as a triplet {l,Wi,Wo)^ where 
I, Wi^ and Wo refer to the lengths of the tagged input queue, virtual HOL input 
queue, and virtual HOL output queue, respectively. The state-space of this 
three-dimensional Markov chain is 



{(0,0,0), {l,Wi,Wo)\l <l<bi,l<Wi<N,l<Wo< 



N} 



and are ordered in a lexicographic order, that is, (0, 0, 0), (1, 1, 1), ...{bi, N, N). 
The set of states {(/,1,1),(Z, 1,2), ...(/, 2, l),...(/,iV,iV)} will be labelled as 
states in level I of the Markov chain. This Markov chain is a Quasi Birth and 
Death (QBD) process with block-partitioned form of transition probability 
matrix T as 



■ A[ A'2 0 • • • 

Aq Ai A 2 0 

0 Aq Ai A2 0 



0 0 • • • 0 Aq Ai A2 

0 0 0 0 S B 



where A[ -h = 1 and Aq -h Aie + A 2 C = (Aq + Ai -f- A 2 )e = e with 
e — [1, 1, 1, . . . , 1 ] . Let Pf)io,Wt{w*.,w[j)\Wt-.i{wi,wo) denote the probability that 
the HOL cell of the tagged queue is blocked, and Psuc,Wt{w'. ,w'j\Wt-i{wi,wo) 
denote the probability that the HOL cell of the tagged queue is transmit- 
ted given that the remaining HOL cells at the end of the last time slot is 
(wi,Wo) and the remaining HOL cells at the end of the current time slot is 
{w[,w'^). Define the matrice B, Bq and Sq as B = [Pbio,Wt{w[,w>^)\Wt-i{wi,Wo)h 
Po = [Pbio,Wt{w[,w>^)\Wt-i{0fi)] and 5 = [Psuc,Wt{w[,w'j\Wt-i{wi,wo)h where 
0 < w[,w'o,Wi,Wo < N. 

Let Sq be the probability that the HOL cell of the tagged input queue gets 
matched given that the tagged input queue is empty at the end of last time 
slot. Prom the definitions of Bq, Sq, S, and B, we can show that: 



Sq + Boe = 1 



(3) 
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Sc + Be — 6 



( 4 ) 



where Sc = Se, Be = Be. As illustrated in the appendix of this paper, Eq (3) 
and Eq (4) help us solve the Markov chain using the Matrix-Geometric ap- 
proach by simply focusing the computation on matrix B and vector Bq- By 
using the above equations, the element matrices in the transition probability 
matrix T can be computed as: 

= \)Sc A'l = XSo A'2 = XBo 

Ao = {l- X)S Ai = A 5 + (1 - X)B A2 = XB 

The remaining subsections will cover the computation of the success and 
blocking probabilities, Psuc,Wt{w'.,w'^)\Wt~i{wi^wo)^ and Pbio,Wt{w'.,w'^)\Wt-i{wi,wo) 
respectively. Once these probabilities are computed, the transition probabil- 
ity matrix T can be constructed. Once the transition probability matrix is 
known, it is a routine matter to derive the steady state equations by uti- 
lizing the properties of Markov chains, and solving the equations to obtain 
the steady-state probability vector. Detailed procedures are presented in the 
appendix of this paper. 



3.2 Computing the blocking and success probabilities 

We now derive the equations for computing the blocking and success proba- 
bilities. The transition of the state of the virtual HOL input /output queues 
from the state {wi, Wq) to state {w[^Wq) is a two step process illustrated in Fig- 
ure 3: First, we account for the newly arriving HOL cells to the virtual HOL 
input /output queues. Then, we consider the transition from the intermediate 
state to the final state after applying the PIM algorithm. 

fWpWj ► (hiX) ►fw/ ) 

Arriving HOL PIM algorithm 

cells (kiX) to find maximal 

matching 

Figure 3 Transition of the virtual HOL queues. 



(a) Arriving cells at the virtual HOL queues 

Let Kt{ki,ko) denote the number of newly arriving HOL cells at the virtual 
HOL input/output queues {kijko new arrivals to the virtual HOL input/output 
queue) ^ at the beginning of current time slot t. Wt-i {wi^Wo) denotes the num- 
bers of remaining HOL cells at the virtual HOL input/output queue {wifwo is 
length of virtual HOL input/output queue), at the end of the previous time slot 
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t-1. Let Ht{hi,ho) = Kt{ki,ko) + Wt-i{wi,Wo). Define aK(ki,ko)\w(wi,w„) = 
Prob(Kt{ki, ko)\Wt-i {wi,Wo) )■ Let po be the probability that a queue is empty 
in a time slot, and pi = 1 — po. A cell that arrives at Q{i,j) when Q{i,j) is 
empty, will observe that another queue is non-empty with probability pi . If 
the current state is (l,Wi,Wo), {N — wi) queues of input i and (N — Wq) jth 
queues of inputs will be non-empty with probability pi. Hence, 



O^K{ki,ko)\W{wi,Wo) 



'0, ki < 0 or ko < 0. 

(N-l\ (N-l\ki+ko-2 2N-{ki-^ko) 

< 1 < ki < N, 1 < ko < N, Wi = Wo = 0. 

^N-Wo^pki-\-ko ^ j _ p^>^2N-{ki+ko-\-Wi-\-Wo) 

. 0 < ki < N — Wi, 0 < ko < N — Wo, 1 <Wi,Wo < N. 



(b) Transition to Wtiw^^w^) 

Having determined the number of cell arrivals to the virtual HOL queues, 
we now consider the transition from the intermediate state to the final state 
after applying the PIM algorithm. Given the tagged input queue Q{i,j), the 
inputs excluding input i are divided into two subsets E and F according to 
whether the jth queue of the inputs is empty or not. The cardinality of these 
sets are {N — Wo) and {wo — 1) respectively. The state of set E and set F 
will affect the transitions of virtual HOL input queue and virtual HOL output 
queue. For the HOL cell of the tagged input queue, its contention process can 
be split into two stages. In the first stage, the tagged input queue contends 
with other non-empty queues at the same input. If it succeeds in the first 
stage contention, it joins the second stage contention with all successful jth 
queues from other inputs. Let Q{i,k)(k ^ j) be the successful queue at input 
i if Q(i,j) is blocked in the first contention stage. We define the following 
probabilities associated with the second transition step in Figure 3: 



• PhioJdQ\H{hi,ho) — Pf'oh{the HOL cell at the tagged input queue gets blocked^ 
and Wt{w[,w'o) = Ht(hi,ho) given Ht{hi,ho)} 

• Pbio.oi\H{hi,ho) — Prob{the HOL cell at the tagged input queue gets blocked, 
and Wt{w[,w'o) = Ht{hi,ho - 1) given Ht{hi,ho)} 

• Phio.io\H{hi,ha) — Pf'ob{the HOL cell at the tagged input queue gets blocked, 
and Wt{w!i,w^) = Ht{hi - I, ho) given Ht{hi,ho)] 

• Phio.ii\H{hi,ho) — Pf'ob{the HOL cell at the tagged input queue gets blocked, 
and Wt{Wi,Wo) - Ht{hi - l,ho - 1) given Ht{hi,ho)} 

® Psuc\H{hi,ho) = Prob{the HOL cell at the tagged input queue gets transmit- 
ted, and Wt{w'^,w'^) = Ht{hi -l,ho- 1) given Ht{hi,ho)} 



195 




Given n = w[ — Wi and ro = — Wq, the blocking probability 

Pbio,Wtiw*^t‘U}'o)\^t-i{‘u)i,wo) computed as: 



( 0, for ri < -1 or To < ~1 

^K{n,ro)\W{wi^Wo)^bloJ)0\H{w^,w'^) 

{n ,ro+l) I W {wi ,Wo)^bloJ)l\H{w'. (To) I'^ o 

'^^K{ri,ro)\W{wi,Wo)^bloJ)l\H{w'^,w•^)0‘ w‘^-l ^ ) 

(ri +1 ,ro ) I W {wi ,Wo ) ^blo-10\H{w[ +1 ,u;^ ) ^0 (^i ) /^i 
~^^K{rijro)\W{wi,Wo)^blo^lO\H{w!^,w[^)0- w\ — l ^ ) 

d"^/C(ri+l,ro+l)lW(ti;<,iyo)-^Wo_ll|if(tyJ+l,iyJ,+l) 

I ^ D lo(ri){ro-l-lo{ro-l)) 

~TCiK(^ri-^l,ro)\W(wi,Wo)-^blo-ll\H{w'^-^l,w'^)~^ w[-{w[,-l) ^ 

• ^ ID iri-l-io{ri-l))lo{ro) 

'^(^K{n,ro-\-l)\W{wi,Wo)-^blo-ll\H{w[,w[,+l) {wC-l)-w{, 

, p (rj-l-fo(r»-i))(ro-l-/o(ro-l)) 

~raK{ri,ro)\W{wi,Wo)^blo-ll\H{w[,w>^) lw[-iy{w'^-l) 

, for ri >1 and To > 1 and Wi = 0 and Wq = 0 

0>K{ri,ro)\W{wi,Wo)^blo-00\H{wyw'^) 

-^aK{ri,ro+l)\W{wi,Wo)PbloJ)l\H{wyw>^-\-l){ro + 1 + ^o(«^o “ 

'^(^K{ri,ro)\W{wi,Wo)^bloJ)l\H{w[,w'^){'^o ““ 1 “ Iq(Wo ~ 1))/(u^q ~ 1) 

+O.K{ri+l,ro)\W{wi,Wo)Pblo.lQ\H{w'.+lMo)^'^i + 1 + “ l))/^i 

+«iC(ri,ro)|W(ii;i,i/;o)-^6/o_10|/f(«;;,0(^i “ 1 k{'^i ~ 1))/W ~ 1) 

1 _ 7~) (ri+14-io(iyi“l))(»'o + l+/o(iyo“l)) 

+aK(n+i,ro+i)\w{wi,w.)^bio.n\H(w',+i,w'^+i)^ 

~^^K{ri,ro+l)\W{wi,Wo)^blo-ll\H{wyw'^-{-l) 

{n ^ro)\W{wi ,Wo)^blo^ll\H{w^ ,w'^) 



( 5 ) 



. . . 1) 

{wi-l-loiwi-l))(ro+l-{-lo{wo-l)) 

(K-i) ■ 



(l«i-l-io(uU-l))(t«o-l-to(u.o-l)) 
(tu;-i)(t<-i) 



1. /or Ti > —1 and r„ > —1 and > 0 and Wo>0 



in which 



/oH-{ □p,“(i-p,ir-“ 



for w = 0 

forw > 0 



( 6 ) 



represents the number of input queues that contain only one buflFered cell, 
and Pii in Eq (6) is the probability that an input queue length is equal to one 
(there is only one buffered cell in this input queue) during a time slot, and is 
given by Pn = (1 - A)7Tie/(l - tto). 

For Wt-i{wi,Wo) = (0,0), the blocking probability Pbio,Wt{wyw'j\Wt-i{o,o) 
can be computed by Eq (5) provided that the function Pn in Eq (6) is replaced 
by = ('^^0 d- (1 - A)7Tie)/pi. 
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W© C3H comput© tliG proh)3(bility P sucjWt{w!^^w^^)\Wt—i{wi,wo) 

^suCyWt(w^,w'^)\Wt-i{wi^Wo) ~ ^K{ki^ko)\W{wi^Wo)^suc\H{ki-\rWiyko-\-Wo) 



(c) Applying the PIM algorithm 

We now compute the probabilities in Figure 3 by considering each iteration of 
the PIM scheduling algorithm. The state of the switch at the beginning and 
end of each iteration 0 is characterized by the following parameters: 

• n(0): the number of unmatched inputs/outputs at the beginning of 0th 
iteration. 

• hi(0): the number of non-empty queues in input i at the beginning of 0th 
iteration, whose outputs are still unmatched. 

• /io(0): the number of non-empty jth queues in n(0) inputs (including input 
i) at the beginning of 0th iteration matching. 

• m(0): the number of inputs/outputs that get matched at the end of 0th 
iteration, m(0) = n(0) — n(0 -h 1). 

• A/ii(0): the number of outputs whose corresponding non-empty queues in 
input i that get matched at the end of 0th iteration, Ahi{^) = hi{(j)) - 

/ij(0 -h 1). 

• A/io(0): the number of inputs in set F that get matched at the end of 0th 
iteration, A/io(0) = ho{(f>) — ho(0 H- 1). 

For the sake of simplicity, we do not mention the iteration number in the 
following discussion. If no iteration number is mentioned, then the current 
iteration 0 is implied. 

Let X{Xj represent the state of the matching process for input i and output j 
of the switch, where Xi,Xj G {0, 1} with 0 representing that the input /output 
is unmatched and 1 representing that the input /output is matched at the 
end of the current iteration. The possible states of the matching process are 
00, 01, 10, and 11. However, the state 11 should explicitly consider if the 
tagged input queue Q(i,j) at input i is matched. Thus the state 11 is split 
into two: lls^c and llbio respectively. Given the current state of the switch 
{n{(j)),hi{(t>), ho {(!>)) and the current state of the matching process XiXj, the 
resulting state of the switch (n(0 + 1), /ii(0 -f 1), ho(0 + 1)) and the resulting 
state of the matching process x[x'^ is controlled by the transition probabilities 
as in Figure 4. These probabilities are functions of the current state of the 
switch (n(0), hi{0), ho(0)) and are defined as: 

• Pbio.x'.x'.\xiXj—P'f'(>H^^ current iteration, the HOL cell at the 

tagged input queue Q{i,j) gets blocked; and x[Xj/x{Xj represent whether 
input i (x[ or Xi) and output j (xj or xj) remain unmatched (represented 
by 0) or get matched (represented by 1) at the end/beginning of the current 
iteration}. 
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• Psuc\oo = Prob{at the end of current iteration^ the HOL cell at the tagged 
input queue Q{i,j) gets matched with output j, given that input i and output 
j were unmatched at the beginning of current iteration]. 




Figure 4 The matching process state transition diagram. 

To derive equations for the transition probabilities, we define the following 
probabilities associated with the first stage of contention for a cell and give 
the computing formulas of them as below: 



• Psuci.e = Prob{the HOL cell at Wa(k ^ j) queue of an input in set E 
succeeds in the first stage contention}. 

• Psuci-ft = Prob{the HOL cell at jth queue of an input in set F succeeds in 
the first stage contention}. 

• Psuci.fe = Prob{the HOL cell at kth(k ^ j) queue of an input in set F 
succeeds in the first stage contention}. 






sucl-e — 



Tl — 1 

n— 1 



sucl-fe 



— ^ ^ sucl— /f 

~ n— 1 



Let t (max(m, ho — 1) < t < n — 1) he the number of queues excluding 
the queue from input i that succeed in the first stage of contention, and 
m is the number of outputs contended for by the t inputs. There are three 
sub-problems to be considered in computing the transition probabilities in 
0th iteration given (n(0), hi(0), ho(0)) and (n(0 -h 1), hi{(j) -h 1), ho{(j) -h 1)): 1. 
What is the probability that t inputs contend for m outputs? 2. What is the 
probability that Aho inputs in set F (whose cardinality is ho-1) get matched? 
and 3. What is the probability that Ah{ out of hi outputs whose corresponding 
queues in input i are non-empty get matched? The equations below consider 
each of the sub-problems in computing the transition probabilities. 
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Computing PbioJQ0\00- In this case both input i and output j remain un- 
matched. Given t and m, the probability that the queue that succeeds in the 
first stage of contention at input i gets blocked at its corresponding output is 



771 

Pt^m\MoJ00.00 = (l-^)(m!5r + (m-l)!5r 



'Po 



pho—l 

sucl_/e 



1 \ pt-ho-\-l 

sucl-e 



Among the m outputs that get matched, A/i* of them will see their corre- 
sponding queues in input i being non-empty. The number of combinations 
satisfying this condition is C'Ah,jWo.ooj)o = (aC-i) (m"-Ah,)- 

Given that input i is blocked, it is clear that each combination of m out of t 
inputs gets matched with equal probability. The probability that Aho inputs 

which are elements of the set F get matched is PAho\bio. 00.00 = . 

Vm/ 

Knowing the above probabilities, Pbio.oo\oo can be easily computed as 



n — 1 



Pbio.oo\oo — (1-1/hi) 

t=max{m,ho — l) 



/ n - ho \ 
— ho + 1/ 



CAhi\blo-00.00 



PAho\blo.OO.OoPt-^m\blo.OO.OO 



Computing Pbio.l0\00- I^ l^his case, input i gets matched while output j 
remains unmatched. So we compute only the aggregated probability over the 
set of all possible Aho- The probability that the queue that succeeds in the 
first stage of contention at input i succeeds in getting matched in the second 
stage of contention is 



■t-^m\blo.l0.00 



(1 - + {m- l)\Sr^)Pl:tV 

An-l){n-l-t) pho-1 
Po ^sucl.fe 



The probability that Aho inputs which are elements of set F get matched is 
PAho\bio-iojoo = . Therefore, Ptio-io\oo is given by, 



■Pwo-ioioo 



(1 - l/hi) 




E 

t=max{m—l,ho—l) 



/ n - ho \ 

— ho + 1/ 



A/iol6/o_10_00M->ml6/o_10_00 



Computing Pbio.oi\oo‘ 1^ case, output j gets matched while input i 
remains unmatched. We compute only the aggregated probability over the 
set of all possible Ah^. There are two cases to be considered here: (i) Q{i,j) 
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fails the first stage of contention, and (ii) Q{iJ) survives the first stage of 
contention. Therefore Pbiojn\oo = PbioJ)i-B\oo'^ PbioJ)i^\ooj where PbioJ)i^\oo 
and Pbiocjn^s\oo probabilities for the two cases (i) and (ii) respectively. 
Case (i): Q{iJ) fails in the first stage of contention 



nio 



.01^100 - il-l/hi) £ 

min(/io — l|t— m+1) x, -v 

y~! f jC'Ahi|6IoJ)lJ3-00-Pt- 

u=l ^ ^ 



t->m|dio_01_B-00 



where 



pti pho—l—upt—ho+l^(^—^){n—l—t) 

^sucl^ft-^sucl-fe -^sucl^e "O 
P __ ^hi — 2\ /n — hi \ 

CAh,\bio.0i^.00 ~ [Ahi-2)[m-AhiJ 
Case (ii): Q{i,j) is successful in the first stage of contention 

PbloJ)l^SJ)0 = ^ W _ ^ ^ jC'A^i|6lo-01-S-00^t-fmj5io_01_S-.00 

* t=max{m,ho—l) ° 

where 

mm(i^o— wi+l) X X - 

■Pt^m|6JoJ)l-Sj00 = (l)(1 “ L ~ 

k=l ^ ' 

pk p/io— 1— ife 

^sucl-ft-^^sucl^fe -^sucl-e "O 

_ f n-hi \ 

CAh.|Mo.01^.00 - l^A/ii-ljU-A/iJ 



Computing Psuc\oo- The state llsuc m Figure 3 is an absorbing state, so 
this transition probability is computed without consideration on m. 



P 1 f^o A ^ pu n - P ,\ho-l-u 

Psuc\00 - ^ ^Psuc^fti^ Psuc.ft) 

Computing Puo ai\Q0- Then, Pbio.ii\oo is computed from the boundary 

condition as Pbio. ii\oo = 1 — {Pbio^oo\oo + Pbio^oi\oo + A/o_io|oo + -Paucjoo)- 
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Computing PuoJil\0l- input i remains unmatched, while out- 

put j is already matched. Then, 



PbloJ 01\01 = ^ ( . jCAhi\bloJ)lJ)lPt-^m\bloMJ)l 

t=m ^ ' 



where 



m 



= (1 - + im- l)!5r-')PL.oPo^"~‘"‘^ 



P suc-O — 



1-pS 



CAh,lMo.01^1 - 



Computing A/o_ii|0i- case input i gets matched while output j 

has already been matched at the beginning of the iteration. 



fftio-lljOl 



u =0 ^ ' 



n—l—u 



Computing Puo A0\l0- ^his case input i is already matched, while output 

j remains unmatched at the end of the iteration. Then, 



/n~l\ ^ /n-/io + l\p p 

nzo-10|10=l ^1 2 ^ \t-h ^l)^^^o\blo.l0^10^t-^m\blo.l0.l0 

^ ' t=max{ho—ljm) ^ 



where 

Pt^m\blo^lOAO 

PAho\blo^l0-10 



cm pt-ho-¥lAn-l)(n-t) 
^sucl^e Po 

(ho-l\ (t-ho-\-l\ 

V Aho J \m—AhoJ 

o 



pho — l 
■^sucl-fe 



Computing Puo ai\io- output j gets matched at the end of the 

iteration. This is feasible only if at least one of j th queues of the K-l inputs 
in set F succeed in the first stage of contention at their respective inputs. 

PbloAl\10 = 1 ~ (1 - -P sucl-ft)^° ^ 

The states of the switch at the end of each iteration, (n(</>), hi{(j)), ho{(t>),XiXj), 
can be viewed as a weighted tree with the nodes of the tree corresponding 
to the switch states. The root of the tree is the initial state of the switch 
{N, hi, ho, 00). All states in level 0 of the tree correspond to the states of 
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the switch at the end of the (j) th iteration of the PIM algorithm. Weights 
are assigned to the arcs between the states, and are equal to the transition 
probabilities Pbio-x^x'^\xiXj P suc\xiXj ’ Each state (^(^)) is 

assigned a probability Pr(n(0), hi{(f)), ho(0), XiXj) equal to the product of the 
transition probabilities along the arcs from the root to the state. The prob- 
abilities PhloJQQ\H{hi,ho)'> PhloJil\H{hi,ho)'> A/o_10|/f(/ii ,/io ) j Psuc\H{hi,ho) 
the end of $ iterations of the PIM algorithm can be computed as 

Pblo-XiXj\H{hi,ho) 5Dn($),/i*($),/io(^) 

Psuc\H{hi,ho) ^ Hsuc) 



3.3 Solving the Markov chain 

As can be seen from the above equations, po, ttq and tti must be known in 
advance in order to compute the steady state probabilities. Prom the Appendix 
of this paper, the steady state probabilities are given by: 

7To = 1/(1 + a /3^~^e + 

__ J 7Toa/3^“^ , for 0 <l <bi 

noa/3’’‘-^XB(I - B)~^ ,farl = bi 

Notice that in steady state the following equation holds 



Po = (1 - A)7 To (7) 

This naturally suggests an iterative solution (Youn et al 1994). Initially, ttq 
is set to zero, which corresponds to the case of saturated offered loads. Then 
Po can be obtained by using Eq (7). Since po is known, the next value of 
7To is computed again. This iterating process continues until both po and ttq 
converge, leading to the values of steady state probabilities tt. 



3.4 Computing the performance metrics 

Once the steady state probabilities are known, then interesting performance 
parameters, such as throughput, mean queue length and mean cell loss prob- 
ability can be computed directly by using the known parameters. Let p, Q, D 
and Pioss be the throughput, mean queue length, mean cell delay and mean 
cell loss probability respectively, then 

P ~ A7To(1 ■” BqG) + ^2l=l ^2v=l ^{hu,v)Psuc\W{u,v) 

Q = X)/=i D -Q/p Pioss = >^n<e 
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Figure 5 The throughput of the PIM switch as a function of offered load 
with a buffer size 6^=10. 



4 NUMERICAL RESULTS 

Both mathematical analysis and simulation results are presented in this sec- 
tion in order to investigate the accuracy of the above queueing model. Fig- 
ure 5 shows the switch throughput as function of offered load A for PIM 
switch sizes 8 and 16 with various PIM scheduling iteration numbers 1, 2 
and 3, respectively. It can be seen that when the switch size increases, the 
throughput of the switch decreases under high offered load (greater than 60% 
when maximum iteration is 1). Also from this figure, we can see that the 
saturation throughput will increase as the PIM scheduling iteration increases. 
It is expected that with more iterations, more HOL cells get matched during 
a scheduling iteration. The curves show that 3 iterations are enough to get 
a high throughput > 90%. Comparing Figure 5 with Figure 1, we can see 
that even under saturated traffic loads, our queueing model approximates the 
original system quite well. 

Figure 6 shows the mean cell delay as a function of offered load A for the 
different PIM switch sizes 8 and 16 with various PIM iteration numbers 1,2, 
and 3. The figures indicate that the mean cell delay increases as the switch 
size increases and also as the offered load increases. But when the number 
of PIM scheduling iterations is increased, even from 1 to 2, the mean delay 
increased slowly with the traffic load as compared with just one iteration. For 
a single iteration PIM scheduling, the mean cell delay increases dramatically 
when the offered load exceeds 60%, which indicates that PIM switches with 
single iteration PIM scheduling will be overloaded when the traffic load is 
greater than 60%. However, for 2 and 3 iteration PIM, this overloaded traffic 
point is about 0.8. This phenomenon can also be observed in Figure 5. Notice 
that when the traffic load is extremely low, such as 0.1, all curves cluster 
into a single point. It isn’t difficult to understand that, under low traffic 
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Switch size: N=8 -o-: analysis Switch size: N=1 6 -o-: analysis 

-+-: simulation -+-: simulation 




Figure 6 The mean cell delay of the PIM switch as a function of offered load 
with a buffer size 6^=10. 




Figure 7 The mean cell loss probability of a PIM switch, as a function of 
offere d load, with a buffer size 6i=10. 



load, the opportunity that more than one HOL cell contend for a common 
input/output is small. That is, single iteration PIM scheduling is typically 
enough to find a maximal matching. When the traffic load grows, the chances 
of conffits increase and more iterations are needed using PIM scheduling to 
achieve a maximal matching. In this case, the analysis results diverse from 
the simulation results significantly when compared to the case that the traffic 
load is low. This phenomenon is due to the approximation in computing the 
transition probability of multiple iteration PIM algorithm. 

In Figure 7, the mean cell loss probabilities of PIM switches with queue size 
of 10 cells are given as a function of offered load. It can be seen that, for a 
medium size PIM switch with 3 iterations PIM scheduling (such as 16-by-16) 
with traffic load less than 60%, a buffer size of 10 cells per queue is sufficient 
to guarantee a cell loss probability < 10“®. 
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5 CONCLUSION 



In order to make the original switch model tractable for analysis, a number 
of assumptions have been added. The most important is that the random 
traflSc, that is, cells are arriving at each input according to an i.i.d. Bernoulli 
process, and the destinated output of arriving cells are distributed over all 
outputs uniformly. In case of non-random traffic loads, the analysis will be 
more complicated than the one for random traffic loads. In future research 
direction, we will try to apply this method to analyze the same kind of ATM 
switches with bursty and correlated traffic. 

The contribution of this paper is two fold. First, the throughput of an ATM 
switch with multiple iteration PIM scheduling in case of saturated traffic 
load is analyzed mathematically. Second, a theoretical analysis for various 
performance parameters including throughput, mean cell delay, and mean cell 
loss probability, of a ATM switch using a PIM scheduling scheme is presented. 
Such theoretical analysis is lacking in existing literature on ATM switches with 
PIM or variations of PIM scheduling (Anderson et al 1993, McKeown 1994, 
Mckeown et al 1994, LaMaire et al 1994). 



6 APPENDIX: COMPUTATION OF THE STEADY STATE 
PROBABILITIES 

Following the steps given in (Youn et al 1994), we give the procedures to 
compute the steady state probabilities of the Markov chain. The method pre- 
sented in (Youn et al 1994) is based on the algorithmic approach given in 
(Neuts 1981). From the definition of the transition probability matrix, we 
know that IIT = II. By expanding this equation, we have: 



7To((l — A) + ASo) + ^i(l — A)5c — TTo (8) 

ttoABo + 7Ti(A5 + (1 - X)B) + 7T2(1 - A)5 = tti (9) 

TTi-iXB -h 7Ti{XS + (1 - X)B) + 7Ti+i(l - A)5 = TTi , foT 1 < i < bi - I (10) 
7Tbi-2XB + 7Tbi-l{XS + (1 X)B) -h TTbiS = TT^.-i (11) 

TTbi-lXB -f TTfe.B = TTft. (12) 

Multiplying Eq (10) by e results in: 



TTj—iARc — ^i(l A)iSc, 



(13) 
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The solution for 7Ti{l < i < bi) in terms of 7Ti_i can be obtained by multi- 
plying Eq (10) by h and using Eq (13), where h = eei and ei = [1, 0, 0, 0]. 

7rj(/i - \Sh - Bh) = 'Ki-iXBcei (14) 

Multiplying Eq (4) by e\ and substitute it into Eq (14), we have a recursive 
formula for tt^ in terms of matrix B. 

TTi = 7ri(A5((l-A)(/-B))-i)'-i (15) 

From Eq (12), we have: 

7T(,.. = 7T6(_iAJ5(7 - B)~^ (16) 

Let a = A5o(7 - A7i - (1 - A)S)-i and /? = XB{{1 - A)(7 - S))-i. Using 
Eq (8), we get: 



7Ti = XBo{I — XIi — (1 — X)B) ^TTo = noa 



(17) 



Using Eq (15, 16, 17), 



r TToa^^ ^ , for 0 <l <bi 

X TToal3'>‘-^XB{I - B)-\ forl = bi 



Notice that ttq + TfiC = 1, we have: 



bi-l 

TTo = 1/(1 + aJ2 + ap'’'-^XB{I - B)~^e) 
1=1 
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Abstract 

In this paper, some results are presented from an attempt to 
study - in a discrete-time setting - the phenomenon of long-range 
dependence. For a well-known source model, traffic characteristics 
such as the power spectral density and the index of dispersion for 
counts are analyzed. Based on these characteristics, the distinc- 
tion between short-range and long-range dependence is touched 
upon. The traffic generated by a superposition of sources is also 
studied, whereby the case of an infinite number of sources gets spe- 
cial attention. In this, the discrete-time version of the M/G/oo 
queue model is called upon. Finally, some results pertaining to 
the queueing behavior of such traffic are discussed. 



1 Introduction 

Since the notions ’long-range dependence’ and ’self-similarity’ [1, 10, 
16] have been brought to the attention of the teletraflSc community, 
a tremendous amount of research effort has been spent on the sub- 
ject. Issues of importance hereby are traffic analysis and modelling, 
and queueing analysis. Fluid-flow models [2, 8, 9, 12] and the related 
fractional Brownian storage [13, 14, 18], combined with elements of large- 
deviations theory, seem to have been the most successful modelling ap- 
proaches yet, be it sometimes at the cost of complex mathematics. 

^The authors wish to thank the Belgian Federal Office for Scientific, Technical 
and Cultural Affairs (DWTC) and the Flemish Fund for Scientific Research (FWO- 
Vlaanderen) for support of this research. 




What we present here, are some results from an attempt to study 
these notions in a discrete-time setting. As a source model, we opted 
for the well-known on-off source. Long-range dependence is expected to 
emerge when heavy-tailed distributions for e.g. the durations of the on- 
periods come into play. These heavy-tailed distributions typically lead to 
probability generating functions - one of the basic tools of our analysis - 
having a branch point at 2 : = 1. This branch point affects the use that is 
made of residue theory and urges us to reconsider some results obtained 
for ’light-tailed’ distributions. This work is part of ongoing research and 
additional study is required to fill in the remaining gaps and to provide 
a more solid mathematical framework. 

The paper is structured as follows. In the next section, the on-off 
source model is introduced. In Section 3, traffic characteristics such 
as the autocovariance function and the index of dispersion for counts 
are analyzed. The difference between short- and long-range dependent 
sources is examined in Section 4. In Section 5, the superposition of 
N sources is considered, whereby the case N 00 receives the most 
attention. It leads to a discrete-time M/G /00 queue model, recently 
also studied in e.g. [15, 19]. In Section 6, two possible approaches to the 
analysis of the queueing behavior - a Benes approach and a slot-to-slot 
approach - are discussed. Conclusions are drawn in Section 7. 



2 The source model 

An on-off source alternates between two states : the on-state - wherein 
one cell is generated per slot - and the off-state - wherein no cells are 
generated, as shown in Figure 1. 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 — ► 

A(z) ► B(z) ► .1 A(z) ► 

Figure 1: A discrete-time on-off source. 

The durations, expressed in numbers of slots, of the visits to the 
on-state - called the on-periods - are iid random variables (rv’s) char- 
acterized by the probability density function (pdf) a(n) = Pr[rA = n] 
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(n = 1, 2, . . .) or the associated probability generating function (pgf) 

+ 00 

A{z) = E[z^^] = 

n=l 

Likewise, the durations of the off-periods are iid rv’s characterized by 
the pdf b{n) = Pr[rB = n] or the pgf B{z) = E[z'^^]. Hereby, ta 
and tb were used to denote the duration of a generic on- or off-period 
respectively. Durations of on- and off-periods are mutually independent, 
their mean values equal 

E[ta] = A'(l) and E[tb] = B\l) 

respectively. Unless otherwise stated, these values are assumed to be 
finite. As a consequence, the stationary version of the process, which is 
of interest here, exists. Variances are given by 

4 = Var[r^] = A"(l)+^'(l)-A'(l)2 



and 



a| = Var[Ts] = B"{1) + B'{1) - B'{lf 

and can be either finite or infinite. 

Two distributions that are frequently encountered in the rest of the 
analysis, are the distributions of the remaining durations or of the 
on- or off-period to which a randomly chosen slot belongs (not counting 
the arbitrary slot itself). It can be shown, see e.g. [3], that, for on- 
periods, this distribution is given by 



a*(n) = Pr[r\ = n] = Pt[ta > n]/E[rA] n = 0, 1, . . . 



and mutatis mutandis for the off-periods. The associated pgf’s take the 
form 






Ajz) - 1 
A'{l){z-1) 



and B*{z) 



m - 1 

B'{l){z-l) 



The number of cells generated by the source during slot A;, either 0 or 
1, will be denoted by q^. The average of can be expressed as 



A = E[qk\ 



E[ta] 

E[ta] + E[tb] 



210 




and its variance as 



<7^ = Yai[qk] = A(1 - A) 

In the next section, some further characteristics of the traffic process qk 
are derived. 



3 Traffic characteristics 

3.1 The power spectral density 

The Fourier-transform 

+ 00 

W)= E 

m——oo 



of the autocovariance function 
C{m) = E[{qo - X){qm ~ A)] 

is known as the power spectral density of the traffic process. Studies such 
as [5] or [11] come to the conclusion that the power spectral density at 
the low frequencies has a serious impact on the queueing behavior of 
the traffic. The power spectral density of long-range dependent traffic 
behaves totally different at these frequencies than that of short-range 
dependent traffic. We return on this in Section 4. 

In Appendix A, it is shown that 

S{f) = (l + (1) 

whereby 



T C{m)z'^ = a‘^Q{z) = 



with 

- 1 B{z)-l [A'(l) + B'mz-1) 
A'{l){z-l) B'{l){z-l) A{z)B{z)-l 
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Given A{z) and B{z)^ S{f) is easily evaluated. As such, C{m) can 
be calculated by numerical transform inversion, as outlined in e.g. [4]. 
Further, it follows from residue theory that 



C{m) = Res 



_2 Q{^) 

(j 

^TTl-\-L 



z=0 



If the only singularities of Q{z) are poles outside the unit disk, say Zi^ 
the above can be rewritten as 



C{m) = — ^ Res 









For m large, one can then retain only the dominant contribution and 
obtain an approximation for C{m). In that case, the decay of the auto- 
covariance will be dominated by a geometric term, i.e., the process will 
be short-range dependent, as to be discussed later on. For long-range 
dependent processes, (one of) the generating functions A{z) or B{z) will 
have a branch point at = 1, due to the ’heavy tail’ of the distribution 
involved. Then, the result from residue theory should be reformulated 
and a non-geometric term will dominate the decay of the autocovariance 
function. 

A quantity of interest in Section 4 is the so-called ’DC-component’ of 
the power spectral density, given by 



5(0) = cr^ 



I 

E[ta] E[tb] 



E[ta] + E[tb] J 



( 2 ) 



3.2 Index of dispersion for counts 

Another well-known traffic characteristic, the index of dispersion for 
counts, was discussed in e.g. [6]. It is defined as 



IDC(m) 



Var[gi + 

E[qi H- . . . -|- Qm] 



and is related to the autocovariance function C{m) by [6] 



IDCW = i f (i-®)0W 



k=—m 



(3) 



In this respect, both traffic characteristics convey the same information. 
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Consider the double transform 



+ 00 



m=l 



of the number of arrivals during m consecutive slots. One can show that, 
for the on-off source model used here, 

T(z 

A'{l)+B'{l)\l-zt 1-t 

A'{l)B'{l)t'^{z - lfA*{zt)B*{t) \ 

(1 — zt){l — t){l — A{zt)B{t)) j 



as explained in Appendix B. 

Taking the first-order partial derivative with respect to 2 : at ^ = 1, 
one finds 



d 

dz 



J{z,t) 



+CX) 

= ^ E[qi + . . . + qm] 

^=1 m=l 



1 A'{l)t 

A'(l) + B'(l) ■ (l-i)2 



from which follows the obvious result 



E[qi + . . . + qm] = mX 



Taking the second-order partial derivative with respect to z at z = 1, 
one finds after some further manipulation 



+ CXD 

^ t"^Var[gi + ... + qm]=^ 



m—l 



\ + t- 2tP{t) 

{l-t? 



From this, one can derive the result 



$(t) 



+°° „2 ft 

m=l ^ 



1 s — 2sP(s) 

a-sr 



ds 



(4) 



(5) 



As for C(m), transform inversion, be it numerical or based on residue 
theory, of equation (4) or (5), then yields Var[pi -f . . . +Pm] or IDC(m). 
(For this, equation (4) seems more appropriate, since it is of a simpler 
nature.) Of course, the same problems with transform inversion as for 
the power spectral density mentioned above will emerge when long-range 
dependent processes are considered. 
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From (5), one can also derive that 



IDC(+oo) = 



TA Tb 



A 



{t A Tb Ta a- Tb ^ 

as follows from equations (3) and (2), see [6]. 



m 

A 



3.3 A special case 



A source whereby transitions from one state to the other form a two- 
state Markov chain, is a special case of the more general on-off source 
model considered here. It is obtained when the distributions for the 
durations of the on- and off-periods are geometric with mean 1/a and 
1/ (3 respectively, i.e., when 



rv7 

^ i — Ti T 

1 — (1 — a)z 



1-{1-P)z 



The transition probabilities from the on- to the off-state and vice versa 
are then given by a and /? respectively. Traffic characteristics for this 
specific model can be calculated using standard techniques from Markov 
chain analysis - see e.g. [7]. One finds, amongst others. 



C{m) = 



and 



Var[qi + . . . + qm] 




25 

1-5 




1 

1-5 )) 



Hereby 6 = l — a—(3\s one of the two eigenvalues of the transition matrix 
governing the Markov chain, the other being 1. It has been verified that 
these results are in full agreement with the ones obtained in the previous 
subsections for the more general model. For instance, 1/5 is the only 
pole of P(^), which is now given by 



1-5 
l-5t 

Transform inversion is then quite straightforward and leads to the above 
expressions. 
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4 Short-range versus long-range dependence 

By definition [10], long-range dependence is present when 



-foo 

C{m) = 5(0) = +00 

m=—oo 



Recalling equation (2), we see that this will be the case when, for in- 
stance, a\ = Var[r^] is infinite (given E[ta] is finite), the so-called 
’infinite variance syndrome’ [10]. This condition implies that A(z) has 
a singularity a.t z = 1, that cannot be a pole since A{1) = 1, but is a 
branch point. As a consequence, the distribution a{n) of which A{z) is 
the pgf, will not decay geometrically, but hyperbolically, i.e., a{n) will 
have a heavy tail. 

Well-known heavy-tailed distributions of a continuous random vari- 
able are e.g. the Pareto-distribution [16]. In search of a versatile heavy- 
tailed distribution for a discrete-time random variable, we focussed on 
a distribution based on the hypergeometric function 



F{a,p-,r, z) 



;^ r(g + n)r(/3 + n)r(7) „ 
r(g)r(/3)r(7 + n)n! ^ 



To be more specific, we used a generating function of the form 



A{a,P;r,z) = z 



F{a,(3-r,l) 



The resulting distribution seems versatile in the sense that its pgf is 
based on a well-studied function for which numerical procedures are 
available [17] and that it has three (real-valued) parameters, which can 
be fitted to yield e.g. a given mean and tail decay. The pgf has a branch 
point a.t z = 1 and the tail of the distribution decays hyperbolically as 



a{a,/3;r,n) 



r(7-g)r(7-/3) 

r(g)r(/3)r(7 -a-p) 



( 6 ) 



for n ^ 1. The variance is infinite and long-range dependence will result 
whenever l<^ — a — j3<2. (The lower bound is required for the mean 
to be finite.) 

Throughout this paper, three different distributions for the on-periods 
will be used for illustrative purposes, as summarized in Table 1: a light- 
tailed geometric distribution (A), a heavy-tailed distribution (B) of the 
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form (6) with infinite variance and a third distribution (C), also of the 
form (6), with finite variance but infinite third moment. Including the 
latter distribution will allow us to distinguish between ’long-range de- 
pendent’ features and features originating from a ’heavy tail’. All three 
distributions have mean E[tj\\ = 100.0 slots. 





variance 


tail behavior 


A 


< +00 




B 


= +00 


~ n-2-5 


c 


< +00 


~ n-3-5 



Table 1: Three different distributions 

In Figure 2, where these distributions were plotted, the slow decay 
of the tails of distributions (B) and (C) clearly shows. In Figure 3, a 
log-log plot of the complementary cummulative distribution, this is even 
more apparent. 




Figure 2: Tail behavior, logPr[rA = n] ver- 
sus n, for various types of distributions. 

Power spectral densities of sources with on-periods as introduced 
above, are shown in Figure 4. A geometrically distributed off-period, 
with mean 25.0 slots, was assumed for all cases, yielding a traffic in- 
tensity of 0.8 Erlang. It is known [10] that, for long-range dependent 
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Figure 3: Tail behavior, log Pr[r^ > n] ver- 
sus logn, for various types of distributions. 



sources, S{f) ~ or logS'(/) ~ -ulogf, when / -> 0, while for 
short-range dependent sources, log5(/) ~ log 5(0). Both types of be- 
havior are clearly distinguishable in Figure 4. Note that while the tail 
of distribution (C) is also hyperbolic and thus ’heavy’, it decays too 
fast to yield long-range dependence in the strict sense. Corresponding 
autocovariance functions, obtained numerically, are shown in Figure 5. 

Evidence for long-range dependence is also present in the sample 
traces presented in Figure 6. The figure was obtained by aggregating the 
traffic over various timescales (1,10,100,. . . 10^ slots respectively). The 
traffic was generated by a superposition (see Section 5) of 5 iid sources 
and the total traffic intensity is 0.8 Erlang. In traffic of type (B), large 
fluctuations occur over large time-scales, while for traffic of type (A) and 
(C), fluctuations quickly die out as the time scale increases. 



5 Superposition 

5.1 N sources 

Traffic characteristics of a superposition of N identical and independent 
sources, are easily derived from those of a single source. Assume the N 
sources generate the aggregate traffic stream p^. The mean total arrival 
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Figure 4: Power spectral density, log S{f) 
versus log/, for various types of source. 




Figure 5: Autocovariance function, C{m) 
versus m, for various types of sources. 



rate is then Xt = E\pic] — NX. Other traffic characteristics are given by 






and 



E[zP^+...+Vm^ = (£;[ 291 +-+ 9 -])^ 
From the latter follows easily 



Var[pi + . . . + = NVax[qi + . . . + qm] 



( 7 ) 




Figure 7: Power spectral density, log S{f) 
versus log/, for a superposition of sources 
of type A. 

For illustrative purposes. Figures 7 and 8 show the power spectral 
density for a superposition of 1,2,5 and an infinite number of sources. 
(The latter case is treated in more detail below.) Figure 7 is for a short- 
range dependent source of type (A), Figure 8 for a long-range dependent 
source of type (B). The total arrival rate was kept constant at 0.8 Erlang 
by varying the mean duration of the off-periods. 

5.2 N — > +00 

An interesting case is that whereby the number of superpositioned sources 
grows infinitely, with given Xt and A{z). As illustrated by Figures 7 
and 8, traffic characteristics quickly approach their limiting values as 
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Figure 8: Power spectral density, log S{f) 
versus log/, for a superposition of sources 
of type B. 



the number of sources increases. For the power spectral density, we find 
by taking a limit 

5(oc)(/) = = At (l + Q(oo)(e^'^"^) + Q(oo){e-^^^^)) 

whereby 

Q(oo){z) = ~ ^ (8) 

Concerning the total number of arrivals in m consecutive slot, see Section 
3.2, derivation of 



+ 00 

J{oc){z,t) = lim 
^ ’ N-^oo ^ — % 

m—\ 

through a limiting procedure seems more cumbersome. For e.g. the 
variance of that number, on the contrary, we easily obtain 

T j.m^T r 1 \ ^ t — 2tA* [t) 

E ^ Var[pi + ...+p^] = At jy—^3 1 

m=l ^ ' 
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through equations (4) and (7). Numerical or approximate transform 
inversion, then again yields C{m) or Var[pi + . . . +Pm]- 

By observing, however, that the number of arrivals in a slot is equiv- 
alent with the number of customers in a discrete-time GI-G-oo queue, 
the equivalent of the continuous time M/G/oo queue, some more results 
can be obtained. Recently this model has been studied in e.g. [15, 19]. 

One can show that the numbers of newly arriving ’customers’ in 
each slot, become iid rv’s with a Poisson distribution, with mean A* = 
Xt/A'{1) and pgf 

exp{A*(z — 1)} 

The service times of the customers are, of course, also iid rv’s with 
pgf A{z)^ the on-time distribution. By analyzing this equivalent queue 
model on a slot-to-slot basis, it is possible to derive e.g. that 

+ 00 

C{m) = A* ^ Pr[rA > k] 

k=m 

Note that this is in full agreement with equation (8) derived above. 
This expression illustrates once more that light-tailed on-periods lead to 
short-range dependence, since 

+ 00 

Pr[rA = m] ~ => C{m) ~ Zq^ => ^ C{m) < +oo 

m——oo 

On the other hand, for heavy-tailed on-periods one has 

+00 

Pr[rA = m] ~ m~^ G{m) ~ ^ C{in) = 4-oo 

m——oo 

when 2 < ^ < 3. Infinite variances for the on-periods thus lead to long- 
range dependence. Note, however, that also for g > 3, as for traffic 
of type (C), the autocovariance function may decay quite slowly, i.e., 
correlation may extend over long time periods. It does not, however, 
lead to long-range dependence in the strict sense. 

One can further derive that 

( +00 

< A* ^ Pt^ta > k]{z'^ — 1) 

I k=m 

m—1 m—l 

+ A* ^ Pr[rA > k]{z^ — 1) + A* ^ Pr[rA > k]{m — k)z^{z — 1) 

A:=0 k=0 
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The first two sums in the RHS represent the contribution of ’old’ sources, 
i.e., source that were already active prior to slot 1. The last sum repre- 
sents the contribution of sources that started generating cells during slot 
1 or later. Taking derivatives and performing some algebra, one obtains 

m— 1 

Var[pi + . . . +Pm] = tu^Xt — A* ^ P^Ita > k]{m — k){m — k — 1) 

k=0 

in agreement with the result obtained through a limiting procedure. 

For m = 1, the above pgf reduces to 

E[z'^^] = exp {A*(z — 1 )t4'(1)} = exp {Xt{z — 1)} 

The distribution of the number of active sources or, equivalently, the 
total number of cells generated in a random slot, is thus Poisson and 
function of the load Xt only. This ’marginal’ distribution is a rather 
smooth distribution and is in no way influenced by the exact form of the 
distribution of the on-periods. The latter does, however, strongly affect 
the correlation structure of process. 

An interesting property of the GI-G-oo arrival processes is that the 
aggregation of two or more such processes is again of that type. This 
is a consequence of the fact that the arrival process of new ’customers’ 
is Poisson. The ’parameters’ of the aggregated GI-G-oo arrival process 
are given by 

A* = At + . . . -I- a;^ 



and 

A(^\ (^) ^ 

— 

From this, it is easily seen that the tail of the aggregated message length 
will be dominated by the heaviest tale of the constituent message lengths. 
In other words, long-range dependent processes will dominate over short- 
range dependent ones. 



6 Queueing 

Two approaches seem promising to analyze the queueing behavior of 
traffic of the type described above when it is fed into a single-server 
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system. The first is based on the Benes formula, the second on a slot- 
to-slot approach. At the time, however, it is by no means clear whether 
these approaches will lead to ’practical’ results, such as an approximate 
formula for the tail of the distribution of the system contents. 




Figure 9: \ogPr[u > n] versus logn, simu- 
lations for GI-G-oo traffic of type B. 

Simulation results, shown in Figures 9 and 10, for a GI-G-oo arrival 
process of type B and C respectively with intensity 0.8 Erlang, give an 
indication of the magnitude of the queues - denoted by the variable u - 
that can build up. For instance, from Figure 9, we learn that for the long- 
range dependent case, the queue exceeds the order of 10^ cells during 
10% of the time. For the other case, the magnitude of the queue is about 
a hundred times smaller, but still very large. Although the simulations 
are too crude to draw detailed conclusions, the figures already point 
towards a hyperbolic decay of the queue contents (a straight line in a 
log- log plot). 

6.1 The Benes approach 

The system contents - observed at the beginning of a slot and denoted 
Uk for slot k - is governed by the equation 



Uk+I =Pk + [uk ~ l]’^ =Pk + max{uk - 1, 0) 



224 






Figure 10: log Pr[u > n] versus logn, sim- 
ulations for GI-G-oo traffic of type C. 



The ’Benes result’ [12, 18] for this system reads 



Uk+i = + Pk-i + . . . + Pk-I - 1) 

from which one obtains 



Pr[uk+i > m] = 

+ 00 

P'^lPk + Pk -1 + • • ■ + Pk-l >m + l\uk-i = 0]Pr[uk-i = 0] (9) 

1=0 

This result is appealing, since it provides a formula for Pr[uk-\-i > m] 
irrespective of the precise nature of the arrival process. Also, a simi- 
lar expression can be derived for systems with service capacity larger 
than 1 or variable service capacity. Intriguing questions are what the 
link is between this general result and the general observation made in 
e.g. [5, 11] concerning the impact of the power spectral density at low 
frequencies, and how, for instance, the index of dispersion of the traffic 
process relates to the probabilities in the RHS of the above formula. 

The event uj^-i = 0 implies - at least - that no sources were active 
in the slot preceding slot k — 1. This latter observation is sufficient to 
determine the future evolution of the process. One can show that, for 
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the GI-G-oo model, 



exp I A* ^ Pr[TA > n](/ 4- 1 — n)z^{z — 1)1 
I n=o J 



Introducing residues in (9), we get the following expression 



^-(/c+l) 



+ 00 +00 

Pr[uk+i > m] = ^ Pr[uk^i = 0] ^ Res 

/— 0 /C— 77l-f-/-f-l 

exp ^ P^Ita > n]{l + 1 — n)z^{z — 1) 

I n=0 



(10) 



2=0 



For a system in equilibrium, Pr[u]^-i = 0] is given by 1 — p, a well- 
known result from queueing theory. Hereby, p is the load of the system 
and equals At- It remains to be determined if replacing residues around 
2 : = 0 by residues around the other singularities (poles or branches) of 
the function involved, will lead to ’practical’ results, or if an accurate 
numerical transform inversion is feasible. 



6.2 A second approach 

The queueing of discrete-time on-off sources was studied by a slot-to- 
slot approach in e.g. [20] for a finite number of sources (with geometric 
off-times), and in e.g. [21] for a infinite number of sources. The model 
in the latter paper is more general than the model of Section 5.2, in that 
the number of new sources becoming active during a slot can have an 
arbitrary distribution. The special case of a Poisson distribution then 
leads to the GI-G-oo arrival process considered here. It is noteworthy 
that the Poisson distribution has a number of properties which simplify 
the analysis and results to some extent. 

The analysis can proceed as follows. Consider the joint pgf of Uk^ the 
number of cells in the system, and of fhe numbers of messages in 
the equivalent GI-G-oo model which still contain i cells, i.e., which will 
still generate a single arrival per slot during the i slots to come. One 
can then establish the following recurrence relation 

Pk+i{z,xi,X2,...) = . . .] 

C +00 

== 2 :“^ exp < A* a{k){xk - 1) 

I k=l 
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• (^Pk{z, Z, ZXi,ZX2, . . .) + ( 2 : - l)^fc(0, 2 :, ZXi,ZX2, ■ ■ .)^ (H) 

The pgf Pk{0^xi^X2^ . . .) can easily be obtained by observing that the 
queue being empty at the beginning of a slot implies no messages arrived 
during the previous slot. This implies that the only messages in the GI- 
G -00 queue are new messages. This straightforwardly leads to 

Pk{ 0 ,xi,X 2 ,. . .) - Pr[uk = 0]exp |a* ^^a(A:)(a;fc - 1)| 



We will not go further into the details of the analysis here, but it is 
possible to derive from this an expression for the mean buffer contents 
in regime. It is 



iis. + 2(1^ 

This expression can also be found from that in [20] by a limiting proce- 
dure, or from that in [21] by assuming a Poisson arrival process for new 
messages. The formula contains the variance of the durations of the 
on-periods and becomes infinite for heavy-tailed on-time distributions 
having infinite variance. 

We believe - but couldn’t prove yet - that, in general, 

Pr[rA = m] ~ Pr[u > m] ~ (12) 

while 

Pr[rA = m] ~ Pr[u > m] ~ (1^) 

Similar observations have been made for fluid-flow models [2]. Of course, 
in order for a result like (12) or (13) to be of ’practical’ value, one should 
also be able to derive the constant of proportionality, i.e., the ’intercept’ 
of the curve In e.g. [21] it was assumed that the dominating 

singularity of pgf of the system contents is an isolated pole, somewhere 
in the interval (1, + 00 ) of the real line, what then leads to geometric tail 
decay. However, this assumption is no longer valid when heavy-tailed 
on-time distributions are involved, since the corresponding pgf ’s have a 
branchpoint a.t z = 1. It remains to be studied how this approach has 
to be modified to deal with that case. 
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Both approaches presented here, the Benes approach and the slot- 
to-slot approach, are naturally related, since both pertain to the same 
model. The connection becomes apparent when one recurses the relation 
(11) (infinitely) many times, and sets all xi equal to one. This yields 

r M 



= lim 

^ M-^oo 






^Pr[ufc_( = 0]^ 

1=0 

■ exp I A* ^2 > n](/ + 1 — n)z"'{z ~ 1)1 

I n=0 J 

f M-1 'I 

I > n](M - n)^”(z - 1) I 

I n=0 J 



• Pk-M{z,Z,z‘^,Z^, 



zM,zM+^,zM+^,...) 



Comparing this with equation (10), one easily recognizes the terms they 
have in common. However, establishing how they converge exactly, still 
requires further study. 



7 Conclusions 

A number of results were presented concerning traffic characteristics and 
queueing behavior of discrete- time on-off sources. At various instances, 
the distinction between short-range and long-range dependent traffic was 
touched upon. Some key issues remain unsolved, and, as such, create 
challenging areas for future research. 

In its strictest sense, long-range dependence is present when e.g. the 
on-period distribution of the sources has an infinite variance. This leads 
to an infinite ’DC- component’ in the power spectral density, a system 
contents having infinite mean, etcetera. However, to the authors’ opin- 
ion, the distinction between geometric tail decay and hyperbolic tail de- 
cay is as important as that between short- and long-range dependence. 
A relation like (12) shows that the system contents can still have a slowly 
decaying tail, even when the distributions of the sources have a finite 
variance and when, as such, the system contents has a finite mean. Also 
for this type of ’short-range dependent’ traffic, a tremendous amount of 
buffering might be needed in order to avoid cell loss, or, in other words, 
to allow for feasible statistical multiplexing. 
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Appendix A 

In this appendix, an expressions is derived for 



+00 

«(/)= E 

m=— oo 



whereby C{m) was defined as E[{qo — X){qm ~ A)]. Since there is at most 
one arrival per slot, one has E[qQqm] = Pi^[{qo = l)&(gm = !)]• This 
can be further expressed as 

+ 00 

Y. = l|(9o = = k)]Pr[{qo = l)&(r^ = A;)] (14) 

k=0 



whereby r* denotes the number of remaining slots of the on- or off- 
period in which slot 0 falls (not counting slot 0 itself). The factor 
Pf'iiQo — = k)] = Pr[r* = k\qo = l]Pr[qo = 1] can be ex- 

pressed as a*{k)X. Recall - see section 2 - that the pgf associated with 
the distribution a*(fc) is A*{z) = [A{z) — l]l[A'{l){z — 1)]. 

The probability Pr[qm = l|(^o = ~ ^)] given by 

Pr[qr. = 11(90 = l)fc(Tj = t)l = I ^ * (15) 

whereby B\ was used to denote the event that slot 1, i.e., the slot just 
after slot 0, is the first slot of an off-period. By considering all possible 
values for the durations of that off-period, with proper weights, one finds 

Pr[qm = l\Bi] = 

+ 00 

^ b{k) {I{m <k)-0 + I{m>k)- Pr[qm-k = (16) 

k=l 

Similar as above, A\ denotes the event that slot 1 is the first slot of an 
on-period. /(.) denotes the indicator function. The probability Pr[qm = 
l\Ai] can, likewise, be expressed as 

Pr[qm ^ ll^i] = 

-t-oo 

Y, d{k) {I{m <k)-l + I{m > k) • Pr[qm-k = ( 17 ) 

/c=l 



229 




Introducing z-transforms in equations (16) and (17), and performing 
some algebra, one can show that 



Xb{z) = = l\Bl]z^ = Z ^ > 

- Z j. 

m=l 



B{z) 

l-A{z)B{z) 



and 



+00 4(2') — 1 

Xa(z) = P^lQm - = Z-^ > 

m=l 



1 



z-l l-A{z)B{z) 
Returning to equations (14) and (15), one obtains 



+°° / A*(z) — 1 

^ E[qoqm]z^ = X z +A*(z)Xb(z) 



m—1 



Z-l 



and, after some further manipulation, 
C(m)z^ = a^Q(z) = 

z — 1 



771=1 



whereby — Var[qt^ = A(1 — A) and 

^ Ajz) - 1 B{z) - 1 [A'(l) + R^(l)](^-1) 

A'{l){z-1) B'{l){z-1) A{z)B{z)-l 

Finally, one obtains equation (1) 

S{f) = a^(l + QieB-f) + Q{e-B-f)) 



Appendix B 



In this appendix, an expression is derived for 



+00 

771=1 



The derivations are quite similar to those in appendix A. Starting point 
is the expression 



J{z,t) 



A'jl) 

A'(l) +R'(1) 



Ja{z, t) + 



A'(l) + S'(l) 






(18) 
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whereby 



+00 /+00 > 

JA{z,t) [Y,a*{k)E[z«^+-+o-\{qo = l)&(r^ = k)] 

m=l Vfc=0 ^ 



and 



+ 00 /+00 



m=l 



{go = 0)&(tb = k)] 



k=0 



It is easily shown that 



E[z'>^+-+i”'\{qQ = 1)&(t^ ^ k)] = 



z'^ : m < k 

E[zQi-^’-+^m-k\Bi] : m> k 



and 



r 1 * Tin k 

0)&(r^ = A:)] = I \ , 

As in the previous appendix, Bi and Ai denote the events that slot 1 is 
the first slot of an off- or an on-period respectively. 

One has 

+ CXD 

^ b{k) {/(m < fc) • 1”^ + I{m > k) ■ - 

k=l 



and 



+ CXD 

Y, a{k) {l{m <k)-z^ + I{m > k) ■ z’^E[z‘>^+-+‘>^->‘\Bi]'j 
k = l 



Introducing a z-transform (in variable t) in the above equations and 
performing some straightforward algebra, one obtains 



+ 00 

KA{z,t) = ^ t^E[z‘>^+-+‘>-\A^] 
m=l 



1 

1 — A(zt)B{t) 



V zt — 1 



-h A{zt)t 



t-1 ) 
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and 



+ 00 
m=l 



1 

1 — A{zt)B{t) 






+ B{t)zt 



A{zt) ~ 1\ 

zt — 1 J 



Further, 



JA(z,t) = - - + A*{zt)KB{z, t) 

zt — 1 

zt tA*{zt){B{t) — l){z — 1) 

1 — zt~^ {zt — l)(t — 1)(1 — A{zt)B{t)) 

and 



JB{z,t) = ^ +B*{t)KA{z.t) 

_ _J_ _ tB*{t){A{zt) - \){z - 1) 

1 — t {zt — l){t — 1)(1 — A{zt)B{t)) 

Inserting these expressions in equation (18), one finally obtains 
T(z = 1 

^ ’ ^'(1)+5'(1) \l-zt l~t 

_ A'jl)B'{l)t'^{z - lfA*{zt)B*{t) \ 

(1 — zt){l — t){l — A{zt)B{t)) j 
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Abstract 

In this paper we study the correlation structure of the output process of an 
ATM multiplexer. We consider two special cases : (i) the output process of 
the D-BMAP/D/l/N queue, a generic model for an ATM multiplexer and 
(ii) a process which results from a renewal process which shares the output 
link of a multiplexer with other connections. Both output processes belong 
to the versatile class of discrete-time Markovian arrival processes (D-MAP’s). 
We derive an expression for the Index of Dispersion for Counts (IDC) and 
for the Index of Dispersion for Intervals (IDI) of a D-MAP. Two classes of D- 
MAP’s are considered depending on the eigenvalues of the transition matrix ; 
those with an aperiodic transition matrix and those with a periodic transition 
matrix. For both cases we derive a closed form formula for the limit of the 
IDC (which coincides with the limit of the IDI) and for the convergence rate 
of the covariance sequence. These results are then applied to the two special 
cases of output processes. 



Keywords 

ATM, Correlation, D-MAP, IDC, IDI, multiplexer 



1 INTRODUCTION 

The Asynchronous Transfer Mode (ATM) must contribute to an efficient use 
of the network resources while guaranteeing the required Quality of Service 
of the different traffic streams. To quantitatively study these goals of ATM, 
there has been made a considerable effort to model ATM traffic sources to- 
gether with the different network elements. The basic queueing model for 
these studies is a multiplexer whose input consists of a superposition of ATM 
traffic sources. Several approaches have been used to derive the required per- 
formance measures of such a multiplexer (e.g. fluid flow, matrix- analytical. 




generating functions, etc...). These models are valid at the entrance of the 
ATM network, but may be inadequate as generic traffic model for e.g. end- 
to-end delay studies. 

To analyze a whole path in an ATM network analytically, modeling and char- 
acterizing the output process of an ATM multiplexer is an essential step. This 
output process will become input process, together with external sources, of 
the next network element. In addition, a characterization of the output pro- 
cess allows an evaluation of the smoothing effect of bursty traffic when passing 
through a multiplexer. 

The main problem when using the output process of the previous stage as 
input to the next stage is that after a few stages the resulting process be- 
comes very complicated and hence intractable. Therefore, it is necessary to 
capture its most significant characteristics. Two very important properties are 
the correlation between the number of cells in the output process in successive 
slots and the correlation between interdeparture times. Many studies (Heffes 
& Lucantoni 1986, Sriram & Whitt 1986) have confirmed the impact of the 
autocovariance sum on the queueing performance. In particular these studies 
stress the importance of the Index of Dispersion for Counts (IDC) and the 
Index of Dispersion for Intervals (IDI), together with their limits. 

In this paper we study a particular class of output processes, namely the 
discrete-time Markovian arrival processes (D-MAP’s). This choice is moti- 
vated by two important special cases. 

(i) In previous papers (Blondia & Casals 1992, Blondia 1993), it has been 
shown that the D-MAP is a generic model for ATM traffic, since the output 
process of a multiplexer whose input consists of a superposition of D-MAP’s 
is again a D-MAP. 

(ii) Assume that a tagged ATM connection, modeled by means of a process 
with renewal cell inter arrival time distribution, shares a multiplexer with other 
connections (modeled as a batch process with renewal batch size distribution) . 
The tagged connection belongs, after passing through the multiplexer, to the 
class of D-MAP’s. More details can be found in (Blondia & Casals 1996). 

For the class of D-MAP’s, we study the covariance of the interdeparture time, 
the covariance of the number of departures in a slot, the IDC, the IDI and 
their limits. 

We distinguish two classes of D-MAP’s : those with aperiodic transition ma- 
trix and those whose transition matrix is periodic. These two classes natu- 
rally arise from the two examples of output processes considered in this paper. 
Other examples of the use of periodic D-MAP’s as well as some properties of 
their correlations can be found in (Herrmann 1994a, Herrmann 19946). 

The paper is organized as follows. Section 2 recalls the definition of the 
discrete-time Markovian arrival process and some related notions. We discuss 
the two important examples of D-MAP’s which are models for the output pro- 
cesses discussed above. In Section 3, we investigate the correlation structure 
of the number of arrivals in a slot of a D-MAP. In Section 4, the correlation 
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structure of the interarrival times of a D-MAP are investigated. We also de- 
rive expressions for the limit of the IDC and the IDL These results are then 
applied to numerical examples in Section 5. Conclusions are drawn in Section 
6. 

2 A MODEL FOR THE OUTPUT OF AN ATM MULTIPLEXER 

In this section we identify two important models for output processes of an 
ATM multiplexer. Both these examples belong to a class of versatile Marko- 
vian processes, called D-MAP ’s. First we recall the definition of this class of 
processes. 



2.1 A Discrete-time Markovian Arrival Process (D-MAP) 

We recall the definition of the D-BMAP, a batch Markovian arrival process 
which has proven its usefulness in many papers (Blondia & Casals 1992, 
Blondia 1993, etc ...). This process is the discrete-time version of the MAP 
defined in (Lucantoni 1991), which was originally called N-Process in (Neuts 
1979). 

Consider a discrete-time Markov chain with transition matrix D. Suppose 
that at time k this chain is in some state 2 , 1 < z < m. At the next time 
instant A: + 1, there occurs a transition to another or possible the same state 
and a batch arrival may or may not occur. With probability (do)i,j, 1 < z < m, 
there is a transition to state j without an arrival, and with probability {dn)ij, 
l<z<m,n>l, there is a transition to state j with a batch arrival of size 
n. We have that 

oo m 

= 1 - 

n=0 j=l 

Clearly the matrix Do with elements (do)i,j governs transitions that corre- 
spond to no arrivals, while the matrices D„ with elements {dn)i,j, n > 1, 
govern transitions that correspond to arrivals of batches of size n. 

The matrix D == is the transition matrix of the underlying Markov 

chain . Let n be stationary probability vector of this Markov process, i.e. 
W D = 7T, 7T e = 1, where e is a column vector of I’s. 

The fundamental arrival rate A of this process is given by A = W kT>k) e. 

A D-MAP is a special case of a D-BMAP, where arrivals have a batch of size 
1. For examples we refer to (Blondia 1993). 

We recall some results concerning D-MAP ’s which are needed in the sequel 
of this paper. A D-MAP is characterized by means of its two matrices Dq 
and Di. Let W be the steady state vector of the underlying Markov chain 
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Do 4- Di. The fundamental arrival rate A of this process is now given by 
A — ^ Di e. Observe the phase of the process at arrival instants. The phase 
transition matrix between these instants is given by (I — Do)“^Di. 

Let p be the stationary vector of the phase at arrival instants, i.e. 

p (I - Do)“^ Di = p, p e = 1. 

This vector can be expressed in terms of ^ as follows : p = ^ tt Dq. Two 
special examples of a D-MAP which will be studied in the next subsections. 



2.2 The Output Process of the D-BMAP/D/l/N Queue 

In (Blondia 1993), it has been shown that the output process of a queue of 
type D-BMAP/D/l/N belongs to the class of D-MAP’s. Indeed, let the input 
process be defined by the matrices Dn, n > 0. Then the output process is a 
D-MAP with parameters 
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The process has been studied in detail in (Blondia Sz Casals 1996), e.g. the 
distribution of the length of k consecutive interdeparture times, the busy 
period distribution and the inter departure time distribution are derived there. 



2.3 A Renewal Process Mixed with Background Traffic 

We tag a connection which shares the output link of a multiplexer with other 
connections, and describe this tagged connection after passage through the 
multiplexer (See Figure 1). First we show that under certain conditions the 
resulting process belongs to the class of D-MAPs. Details can be found in 
(Blondia & Casals 1996). 

We consider a discrete-time queueing system with deterministic service time, 
the duration of which equals one time-unit (i.e. a time-slot). We assume the 
queue has a finite capacity of N cells. 

The input traffic consists of 2 streams : 

• Tagged traffic stream 

This stream has a cell interarrival time distribution which is assumed to 

be renewal, defined by the vector b = (6i, ..., 6/c), where 

bk = Pr{ interarrival time between consecutive cells is k slots }. 
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Figure 1 Queueing model for departure process of tagged stream 



Examples : 

(i) CBR traffic (interarrival time is deterministic) ; 

(ii) On/olf sources, where the on and off periods are assumed to have a 
duration which is geometrically distributed and such that while in the on 
period, the cell inter arrival time is deterministic. 

• Background traffic 

The number of background cells arriving in a time slot is a renewal process; 
let 

Ok = Ft { k arrivals during a time slot }, k >0. 

Example : 

\ A: 

Poisson background traffic with arrival rate A, = k >0. 



In order to describe the tagged traffic stream, after it has left the multiplexer 
which it shares with background traffic, we introduce the matrices Hi and 

H2. 

The transition matrix of the number of cells in the system (between consec- 
utive slots) when only the background stream is taken into account is given 
by 



Hi 



f do 
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The transition matrix of the number of cells in the queueing system between 
consecutive slots, when both streams are taken into account and knowing that 
an arrival of the tagged stream occurs, is given by 





tto 


ai ... 


0>N-2 


l^n=N-l 
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ao 


ai ... 


ClN-3 


l^n=N-2 
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ao 
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V 0 0 



0 1 
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Now we give an important property of the interdeparture time of consecutive 
cells of the tagged stream. The proof is straightforward. 



Property 1 Assume that two consecutive arrivals of the tagged stream ob- 
serve a queue length ofi\, resp. Z 2 - Then the interdeparture time between these 
two cells of the tagged stream is k — i\ + i 2 , with probability bk, ^ k < K . 



We show that the tagged stream after the multiplexer is a D-MAP. 
Define for each slot k the following variables : 



• Sk : number of slots to go until the next departure of the tagged stream. 
Clearly, its value is the interdeparture time when a cell of the tagged stream 
leaves the multiplexer and it decreases by 1 at each slot, until it reaches the 
value 0 (i.e. when the next departure occurs) {s < Sk < S = N K — 1) . 

• ik : queue length of the multiplexer at the arrival instant of a tagged cell, 
which was the last of the tagged stream to depart before slot k. This means 
that the value of remains constant between departures of the tagged 
stream (1 < U < iV). 



Then {(sa;? u) I ^ > 0} forms a discrete-time Markov chain, with the following 
transition matrix D. 

• If s/fc = 5 > 0 and 1 < ik = i < N, 

l,i) ~ f 

= 0 elsewhere, 

• If Sk = s = 0 and 1 < i < N (i.e. a tagged cell departed form the multi- 
plexer at slot k) : 

i+i— Iji) i^k ^ [H2H2 , k !,•••, j 1 , • * • , N 

= 0 elsewhere. 
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The matrix D defined in this way is the transition matrix of this Markov 
chain. The matrices Do and Di defined as 



[Do]s,i 



[D].,i 5/0 r [DJo.i V( 0,0 

and [Dijsji = \ 

0 V(0,i) [ 0 V(s,i), s/0, 



describe the cell generation process of the D-MAP that models the departure 
process of the tagged traffic stream. 

Hence under the assumption that cells of both tagged and background connec- 
tions arrive according to independent renewal processes, the output process 
of the tagged traffic stream belongs to the class of D-MAP ’s. 

The transition matrix has the following form 



D = 



/ Uo 

i 



Ui 

0 

I 



U2 

0 

0 



U 5-1 Vs \ 

0 0 

0 0 



V 0 0 0 



I 0 J 



In view of the special form of D, it is possible to simplify the computations 
as follows. 

First the steady state vector W = . . . ,Ws) of the matrix D is com- 

puted. 

As n satisfies ttD = W, we have that 



TToUi-i -f- TTj — 7Ti_i, Z — 1,...,5 



'T^oVs = TTS- 



This implies that ttq satisfies But as Ef=o Ui = H = 

we have that 

^oH = ^0 (1) 



Furthermore, as Tr^e = 1, we derive that 

^oE(i + l)Ui]e = l. (2) 

i=0 

Formulas (1) and (2) completely determine Wq. The other W^’s are given by 

^n=Wo(Ef=„Ui). 
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3 CORRELATION STRUCTURE OF THE NUMBER OF 
ARRIVALS OF A D-MAP 



In this section we refer to the occurance of a cell in a slot as to a cell arrival 
(since the D-MAP is called Markovian arrival process), but the results will 
be applied in the next section to D-MAP ’s which are departure processes. We 
study the correlation between both successive interarrival times and number 
of arrivals in successive slots. 

Recall that when (Xi, A 2 , ..., Ajfc) are random variables, then the correlation 
between Xi and Xk can be expressed in terms of the covariance matrix 

cov(XiXO = E[(Xi - - fxk)] 



with fii and Hk being the scalar mean of Xi and Xk . 
The scalar covariance function is given by 

cov{XiXk) = Wcov{XiXk)e = 7fE[XiXA;]e - fii^k 



3.1 Correlation between Arrivals 

In this subsection we study the correlation between the number of arrivals in 
a slot. 

Let (Xi, . . . ,Xfc) be random variables, where X{ is the number of arrivals (0 
or 1) at time slot i. In (Blondia h Theimer 1989), it has been shown that 

Theorem 1 The scalar covariance of Xi and Xk is given by 

cov{XiXi) = ^Die — (WDie)^ 



cov{XiXk+i) = ^Di(Do 4- Di)^-^Die - (^Dle)^ k>l. 

Remark that an extension of this theorem for D-BMAP’s was obtained in 
(Blondia 1993). 



(a) Correlation of an aperiodic D-MAP 

In order to identify over what time period correlations do exist, we study the 
way the sequence of covariances cov{XiXk) converges to its limiting value (if 
it exists). 

We have to distinguish between two cases depending on the number of eigen- 
values of Do -f- Di on the unit-circle. 

Theorem 2 Assume that the matrix Dq + Di is diagonizahle and that the 
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eigenvalue 1 is the only one on the unit circle. The convergence of the co- 
variance of the number of arrivals towards zero is geometric, The ratio is 
determinded by the eigenvalue X 2 ofT>o + Di with the largest absolute value, 
excluding 1. Thus 

I cov{Xi,Xk) 1= c- I A 2 1^, for k 00 , 
with c a certain constant. 

PROOF. Since Do + Di is diagonalizable, we can find the following spectral 
representation (see e.g. (Cinlar 1975), Theorem 5.1, p.379) 



Do + Di — Bi + A2B2 + • • • + A^Bn- 



From the theory of Perron-Frobenius we know that an irreducible stochastic 
matrix Do + Di has a unique eigenvalue 1 and the other eigenvalues have 
absolute value |A| < 1. 

Substituting this representation for Do + Di in the expression for cov{Xi , Xk) 
derived in Theorem 1, yields 

cov(Xi,Xk) = A2“^WDiB2Die H h A^~^ 7 fDiBnDie 

and this implies the required result. ■ 



(b) Correlation of a periodic D-MAP 

When there is more than one eigenvalue on the unit circle (apart from the 
eigenvalue 1), then the situation is somewhat more complicated. It means that 
the matrix Do H- Di is periodic, i.e. there exist ^ > 0 (i.e. the the number 
of eigenvalues on the unit circle) and matrices F^, i = 1,2,..., (5, such that 
Do + Di can be transformed into 



Do + Di 



/ 0 Fi 0 

0 0 F 2 



0 0 0 

V 0 0 



0 0 \ 
0 0 



0 F,_i 

0 0 / 



(3) 



The 6 different eigenvalues with | Aj |= 1, are {Xj = e^^ \ j = 0, 1, . . . , ^ — 1}. 
The convergence mentioned in Theorem 4, is no longer valid. However it is 
possible to prove the following result. We denote by 0 addition modulo 6. 



Theorem 3 Assume that the matrix Do + Di has 6 eigenvalues on the unit 
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circle, i.e. it has the form as shown in (3). Let ^ (resp. 

= (ItJ, . . . right) eigenvector o/Dq+Di with eigenvalue 

one. 

If pj = TT^Fjl/j^i, then the S possible limits of the S subsequences of the 
covariance sequence are given by 

s 

l^cov{Xi,XnS+i) = {s ^ HiiJ,j,l = l,2,---,S. 

i=l j^i^l 



PROOF. See (Geerts 1997). ■ 

From this theorem we see that in the covariance sequence we can distinguish 
6 subsequences each converging to a different limit. Note that some of these 
limiting values can be equal. 



3.2 Limit of Index of Dispersion for Counts of a D-MAP 

An important measure for the correlation is the Index of Dispersion for Counts 
(IDC). 

Denote by (/) the conditional probability that in k slots there are I arrivals 
and at the A;-th slot the phase of the process is j, given that at time t=0 the 
phase was i. Let 



/=0 



Clearly = (Do + We define the index of dispersion for counts 

(IDC) as 



C{k) = 



Var[iV(^)] 
E[ATW] ’ 



where Var[Ar^^^] and denote the scalar variance, resp. scalar mean, of 

the variable For a general stationary arrival process, it is known that 
the following holds 



C{k) = 



kcoy(XiXi) + 2 {k - j)cov(XiXi+j) 
kE[Xi] 



It is also well known that for a process for which the number of arrivals in a 
slot is renewal, C{k) = cf, for all A: > 1, where cl is the squared coefficient 
of variation of the number of arrivals in a slot. In particular for a Bernoulli 
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process, C{k) = I - p (Sriram & Whitt 1986), where p is the probability of 
generating an arrival in a slot. 

In contrast to the limit of the correlation of the arrival process of a D-MAP, 
the limit of the IDC is not dependent on the periodicity of the transition 
matrix Do + Di. The limit of the IDC has a unique value. We give an explicit 
expression of this limit for a DMAP in the next Theorem. 



Theorem 4 Consider a D-MAP with ergodic Markov chain Do + Di. The 
limit of the IDC of this D-MAP is given by 



lim C{k) = 

fc — >- 4-00 



TrDie — [TrDie]^ + 27rDiZDie — 27r[Di]^e 
7rY>ie 



( 4 ) 



with Z the fundamental matrix of the Markov chain Do 4- Di, given by 



Z = [I-(Do + Di-e7f)]-i. 



PROOF. In view of Theorem 1, 



Y:'ZI ^coy{Xi,Xi+j) = V[^Di[Do + Di]^-iDie - (WDie)4 

= ^[(Do + - e7f]}Die. 

In view of (Kemeny & Snell 1967, Theorem 5.1.4 ,p.l01), the following series 
is Cesaro-summable 

j=l 

and the Cesaro-limit is given by Z — I, with Z == [I — (Do 4- Di — 

the fundamental matrix of the Markov chain Do 4- Di. From this we obtain 

expression (4) . ■ 



4 CORRELATION STRUCTURE OF INTERARRIVAL TIMES 
OF A D-MAP 

Let (Ti, . . . ,Tjfe) be random variables representing the inter arrival times un- 
til the k-th. arrival. Based on the results in (Blondia & Theimer 1989) it is 
straightforward to prove the following expression for the scalar covariance of 
Ti and T^, A: > 1. 
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Theorem 5 The scalar covariance of T\ and Tk is given by the following 
formulas : 



cov(riTi) = 2p[(I - Do)-i]"e - [p(I - Do)-'e]2 - p(I - Do)“'e 

cov(TiTfc+i) = p(I - Do)-'[(I - Do)-iDi]'=-i(I - Do)-'e 

-[p(I-Do)-ie]2, k>l. 



4.1 Correlation of Interarrival Times of a D-MAP 

The analysis of the correlation decay for interarrival times of a D-MAP is 
completely analogue to the analysis of Section 3.1. Again, the correlation 
structure depends on the (a) periodicity of (I — Do)~^Di. The geometric decay 
in the periodic case is determined by the eigenvalue //2 of (I — Do)“^Di with 
the largest absolute value, excluding one. An analogue of Theorem 3 can also 
be stated here in case (I-Do)-'Di is periodic. 



4.2 Limit of the Index of Dispersion for Interarrival Times 
of a D-MAP 

The dependence among successive interarrival times can be expressed by 
means of the Index of Dispersion for Intervals (IDI). The IDI, also called the 
k-interval squared coefficient of variation sequence is defined as the sequence 
/(/c), k>l, given by 

It is well known that the following holds : 

^ _ fccov(XiXi) + 2 Y:]zl{k - i)cov(XiX,+i) 

Wi])2 

When the interarrival time distribution is renewal, then I{k) = c^, for all 
A: > 1, where is the squared coefficient of variation of a single interarrival 
time (Sriram & Whitt 1986). 

The limit of the IDI is an important measure to characterize the effect of 
an arrival process on the congestion of a queue in heavy traffic (Iglehart 
Whitt 1970). 
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Theorem 6 The limit of the IDI of a D-MAP is given by 



lim,^+oo /(fc) = A2{2p(I-Do)-iW(I-Do)-ie-[p(I-Do)-ie]2 

-p(I-Do)-ie}, 



with W the fundamental matrix of the Markov Chain (I — Do) ^Di, given by 



W = [I-((I-Do)-'Di-ep)]-i. 



PROOF. Follow a similar reasoning as in Theorem 4. ■ 

In (Cox & Lewis 1966) it is shown that the limit of the IDC and the limit of 
the IDI coincide, i.e. lim^ ^oo C{k) = lim^ ).oo ^(^)- 



5 NUMERICAL EXAMPLES 

In this Section we apply the results obtained in Section 3 to the two special 
output processes considered in Section 2. 



5.1 Example 1 

We consider an ATM multiplexer whose input consist of a number of on/off 
sources. Both the on and off period are geometrically distributed and while a 
source is in the on period cells arrive with inter arrival time d slots. In (Blondia 
Sz Casals 1992), it has been shown that this superposition can be adequately 
approximated by means of a D-BMAP. The resulting model for the ATM 
multiplexer is a D-BMAP/D/l/N queue. 

We apply the results described in the previous sections to characterize the 
correlation structure of the output process. In particular, we illustrate the ef- 
fect the burstiness of the input sources has on the IDC of the output process. 
We consider three types of sources, each having the same arrival rate, but 
with varying burstiness. Clearly type 1 is more bursty than type 2 and type 
2 is more bursty than type 3 (see Table). 

We let M = 7 sources enter the multiplexer. The buffer of the multiplexer is 
assumed to be A = 5. Figure 2 shows an increasing IDC (and its limit) for 
an increasing burstiness of the input traffic. 
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Figure 2 IDC for variable bursti- 
ness of input traffic 



Type 


mean on period 


mean off period 


d 


limit idc 


type 1 


120 


880 


2 


69.30 


type 2 


300 


700 


5 


54.56 


type 3 


600 


400 


10 


24 



5.2 Example 2 

Consider a CBR source with interarrival time 4 slots which is mixed with 
a Poisson traffic stream in a multiplexer. The corresponding matrix of the 
resulting D-MAP output process has eigenvalue 1 with multiplicity 4. Figure 
3 shows the behavior of the covariance function. In Figure 4 we show a detail 
of Figure 3, it clearly illustrates Theorem 5. We distinguish four subsequences 
of the covariance sequence. 




Figure 3 Covariance of a tagged Figure 4 Covariance of a tagged 
output process output process : detail 
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5.3 Example 3 



In this example we tag a CBR connection (interarrival time distribution 6 
slots) and let it share a multiplexer with Poisson input (= background traffic) 
with variable arrival rate A = 0, A = 0.4 and A = 0.8. Figure 5 shows how 
the covariance changes with the rate of the background traffic A. The higher 
the rate, the flatter the covariance curve, and hence the less important the 
correlations are. 



IDC 





Figure 5 Covariance for variable Figure 6 Covariance against vari- 
Poisson background traffic ability tagged source 



5.4 Example 4 

In this example we illustrate the impact of the variability of the arrival process 
of the tagged source on the IDC of the output process. We consider three types 
of traffic : 

• type 1 : 6 = [0 0 0 0 1] 

• type 2 : = [0 0 0.3 0 0 0 0.7] 

• type 3:6== [0.1 0 0 0 0.1 0 0 0 0 0.7] 

From Figure 6 it follows that the higher the variability of the input stream, 
the larger the covariance (and its limit) is. 



6 CONCLUSIONS 

In this paper we have investigated the correlation structure of two important 
models for the output process of an ATM multiplexer. We have given closed 
formulas for the IDC and IDI and for their limits. Moreover the limiting 
behavior of the covariance function is also characterized. 

These measures of the output process are very useful when investigating end- 
to-end performances in an ATM network. In particular these characteristics 
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will be used to describe ATM input traffic to intermediate nodes in future 
work. 
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PART FOUR 



Call Admission Control (CAC) 




Call blocking in multi-services systems on 
one transmission link. 
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Institut National des Telecommunications, Dept. RST 
Contact: Gerard.Hebuterne@int-evry.fr 



Abstract 

Future broadband integrated services digital networks (B-ISDN) are 
expected to use the Asynchronous Transfer Mode (ATM) technology and 
support multiple services. In the multiservices context, three Connec- 
tion Admission Control (CAC) strategies with guaranteed Grade of Ser- 
vices (GoS)are presented: Complete Sharing (CS), Complete Sharing with 
Equcilization (CSE) and Routes Separation (RS). The methods are de- 
scribed and compared in order to identify their most suitable operating 
regions. A mapping scheme selecting the appropriate CAC method ac- 
cording to traffic conditions and environments is then deduced. Perfor- 
mance results are reported for a set of reference scenarios. 

Key words: ATM, multiple services, CAC, equalization, call blocking 



1 Introduction. 

The presence of different emerging services with specific GOS in future ATM 
networks requires the development of new services acceptance models. There 
are three levels of acceptance: cell level, burst level and call level (see [ROB92, 
RMV96]). Most studies have focussed on cell and burst levels. Studies on call 
level and multi-rate traffic are not as common. The thrust of this work is to 
conduct the analysis at the call level in the presence of multi rate traffic sources. 

The connection oriented property of ATM suggests the allocation of part of 
the resource for the entire connection life. Assuming that an equivalent band- 
width characterization of Variable Bit Rate (VBR) sources is adopted, only 
Constant Bit Rate (CBR) calls are considered. Refer to [RTG94] for the deriva- 
tion of the equivalent bandwidth for VBR traffic. Only one unique transmission 
link receiving input traffic resulting from the superposition of N traffic classes 
is considered in this study. Three CAC strategies are defined for this multi 
service system where each class requires a different GoS. Exact results are pro- 
vided for two of these methods while an approximate solution is reported for 
the remaining scheme. 




Much work has been devoted to the issue of determining GoS allocation to 
multiservices connections [SkR93, Ros95], etc. Here, we extend this approach 
by showing how to give the same GoS to connections with different bit rate 
requirement. 



2 Description of call admission methods. 

Arriving calls are accepted only if the available link capacity is greater than 
or equal to the required call bit rate. A call of class i {i = 1, ..., AT) with bit rate 
requirement di is blocked with a probability Further, the total capacity is 
denoted as C while Cr represents the available resource capacity. Given these 
notations, the three methods developed in [RKK88] are described. 



2.1 Complete Sharing (CS) method. 

In this most often used method, the transmission link can be assigned to 
any call type or class, see Figure 1. 



Call 1 






: 






Transmission Link: CMbps 








Call N 





Figure 1: Complete Sharing 
The condition for call acceptance is as follows: 

Call Admission Control (CAC) 1 (CS) 

An arriving call of class i will be accepted if and only if the available link capacity 
Cr is greater than or equal to the bit rate requirement di. 

Cr > di (1) 

2.2 Complete Sharing with Equalization (CSE) method. 

The CS method exhibits the drawback of causing higher blocking rates to 
calls with higher bit rate requirements d,- while favoring calls with lower bit rate 
dj {Bi > Bj). In order to provide fair access to all classes, despite their different 
bit rate requirements, an equalization mechanism can be introduced. 

2.2.1 Definition of equalization mechanism. 

The rule for equalizing call blocking probabilities presented in [RTG94] is 
repeated below: 
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Mechanism 1 (Equalization) 

Given N dijferent classes, call blocking probabilities Bi are equal if and only if 
an acceptance threshold 6 set to maxdi is used, (recall that d{ is the bit rate 
requirement of class i ) 

6 — max{c/i} 

i 

Note that the threshold 0 is the same for all classes. An algorithm for 
call acceptance, based on the complete sharing strategy with the equalization 
mechanism is defined in the next section. 

2.2.2 CS with Equalization of call blockings = Equalization. 

With the equalization mechanism only calls from the class with higher bit 
rate requirement can access the whole link capacity see Figure 2. 



Call 1 



Call N 







Transmission Link: C Mbps 



Figure 2: Complete Sharing with Equalization 
The call acceptance criterion becomes as follows: 

Call Admission Control (CAC) 2 (CSE) 

An arriving call of class i will be accepted, if and only if the available link 
capacity Cr is greater than or equal to the threshold 6. 

Cr > e ( 2 ) 



2.3 Route Separation (RS) method. 

In this strategy, the link is divided into resource sub-groups. There are as 
many sub-groups as call classes, with Ci as the link capacity of class i sub-group, 

see Figure 3. 




Figure 3: Routes Separation 
Thus, the condition of acceptance is given as: 
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Call Admission Control (CAC) 3 (RS) 

An arriving call of class i will be accepted if and only if the available capacity 
Cji of the class i sub-group is greater or equal to the bitrate requirement di . 

> di with (3) 

i 

This method is also refered to as Class Limitation, Complete Separation, 
Complete Partitioning, and so on in the litterature. 



3 Comparison of access control strategies. 

In call level CAC, the key GoS parameters are: bitrate requirement, arrival 
rate, holding time and blocking probability. Type i arrival traffic is assumed to 
follow a Poisson process with rate A*. During the holding time of a class i call, 
assumed to have a negative-exponential distribution function with mean 
a constant bit rate di is reserved for this call until completetion. 

A scenario with a mix of two different traffic classes on the same link is 
analyzed. The total offered traffic is kept constant. Link capacity is increased 
to achieve a blocking probability less than or equal to 1% while keeping the 
product p.C (that is the total offered traffic) as constant. Variable p represents 
the utilization factor, which must be decreased accordingly. This guarantees 
operation under the 1% blocking condition. Bit rates for each class as well as 
the offered traffic ratio are known. The normalized offered traffic from a class i 
call is denoted as Ai. The ratio A 1 /A 2 varies from 0.01 to 100. 

Since we operate in regions of low blocking ratios (less than 1%), no distinc- 
tion is made between offered and carried traffics. 

p • C = ^ Ai with Ai = ^ (4) 

In this section, RATIO represents where Ai is the offered traffic for a 
call of class i. Parameters are given in the following Table T. 



Calls 


di 


Ai 


Class 1 


di Kb/s 


Offeredtratiic 

1+1 /RAT 10 


Class 2 


d 2 Kb/s 


Utferedtrattic 
1 A- RAT 10 



Table 1 : Parameters of the comparison 



The bandwidth values are discretized: a basic bandwidth unit AC is defined 
using the gcd^ function: 



AC = gcd{di} l<i<N 

^ gcd means greatest common divisor. 



(5) 
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The maximum number of available basic bandwidth units is denoted as M 
and the number of required basic bandwidth units (by class i) is For the 
recursive solution, states of the traffic model are defined hy m {m — 0, 1, M), 
bandwidth units. 



M = 



C 

AC 



Si = 



A. 

AC 



( 6 ) 



3.1 Analysis of the Complete Sharing method 

Two approaches may be used in order to obtain exact call blocking proba- 
bilities for this access control strategy, namely : 



• The first one is beised on a product form solution as described in [EMi73]. 
The system state is defined as the number of accepted calls from each class 
(ni, ...jHat). The multi-dimensional state space has as many dimensions 
as the number of traffic classes. This leads to the typical state explosion 
problem. An example of the state space is depicted in Figure 4. 




Figure 4: State space example for the product form solution [N = 2, C\ = C/5, 
C 2 = C/2.5) 



• In the second approach, the multi-dimensional state space is mapped into a 
one-dimensional state space without affecting the resulting blocking prob- 
abilities. These results are given using a recursive solution according to 
the algorithm proposed in [DRo87]. This method, suitable for alleviating 
the state explosion problem, will be explained in the following paragraph. 
A state diagram is given in Figure 5. 

The unnormalized state probabilities can be derived using the following re- 
cursive algorithm: 

1 m = 0 

0 m < 0 

p{m) = N (7) 

i^p(m -Si)-Si-^ 0<m<M 

i=l 
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Figure 5: State space reduction for recursive solution (M = 5, = 1, J 2 = 2) 



After normalization, state probabilities and blocking probability for calls of 
class i are obtain as: 



M M 

p(m) = p(m) • p(m))-i 5,- = p{m) (8) 

m=0 m=M—di+l 

Using Equation 8 combined with Equations 4 through 7, we can derive the 
total capacity of the link under the constraint of 1% call block and thus the 
total offered traffic for the link. 



M 

max(6*) = max( p(^)) < 0-01 (9) 

m=M -Si-^l 

The total capacity of the link can be obtained by C = M- AC as the following 
equation: 

pM = Y,piSi or = 

i i 

3.2 Analysis of the Equalization Method 

Since exact solutions for the CSE method do not exist, an approximation, 
based on the recursive solution, is proposed in [ROB92]. The states are defined 
as before by the number of occupied basic bandwidth units, but the state space 
description is slightly different since some of the transitions between states dis- 
appear following the introduction of the threshold 0. Figure 6 gives the state 
space diagram for the Equalization method. 




14 3 / 2 ^ 2\h 5/2\k 



Figure 6: State space example for recursive solution with equalization (M = 5, 
Si = 1^62 = 2 ) 










Analysis of the equalization mechanism requires the introduction of a new 
function Gi{m — Si) integrating the notion of threshold in call acceptance man- 
agement. It is defined as: 



Gi{m — Si) = 



Si AG <G-e 
0 m^AG>G-e 



( 11 ) 



The unnormalized state probabilities can be obtained using the following 
recursion algorithm Equation 7), leading to an approximate solution: 



1 m = 0 

0 m < 0 

p(m) == N 

^'^p* {m - Si) ■ Gi{m - Si) ■ ^ 0< m < M 

i=l 



(12) 



After normalization, state probabilities p*{m) and blocking probability B* 
for class i calls are: 



M M 

B* = p*{m) with p* =p*(m) • ^p*(m) (13) 

m=min{M — 6i,(C — 9i)/ m=0 

Total link capacity and offered bit rate can be evaluated using the same steps 
as in the CS method. 

3.3 Route separation. 

In this strategy, each call class is assigned to its dedicated transmission link 
sub-group. To obtain the total link capacity, link capacity from all sub-groups 
must be summed up. Since mixing or multiplexing of classes does not occur in 
RS, call blocking probability for each class can be obtained directly from the 
Erlang loss formula. Thus call blocking as a function of number of calls of class 
z, Ni, is easily obtained as follow: 

Bi^E{pi,Ni)= with pi = ^ (14) 

I^i 

j<N^ 

Total link capacity is calculated directly through: 

c = Y,Ni-di^Y.^i ( 15 ) 

i i 

4 Numerical comparison of the control algorithms 

The selected bandwidth unit is 100 Kbps. Experiments are conducted with 
offered traffic of 25 Mbps and 155 Mbps, typical ATM transmission link capac- 
ities. 
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In this section numerical results are displayed corresponding to the previous 
comparison. Curves show carried traffic when ensuring B\ and B 2 less than or 
equal to 1%. 

4.1 25 Mbps links. 

Four tests with different bit rate requirements were conducted for each con- 
trol strategy for a given offered traffic load. The initial objective was to extract 
from these tests the most appropriate control strategy for each scenario. Total 
link traffic is fixed at 25 Mbps, and test parameters are given in Table 2. 



Calls 


TEST 1 


TEST 2 


TEST 3 


Class 1 


10 Kb/s => 0.1 unit 


200 Kb/s => 2 units 


1 Mb/s 10 units 


Class 2 


50 Kb/s 0.5 unit 


2 Mb/s => 20 units 


3 Mb/s => 30 units 



Table 2: Tests with 25 Mb/s 




Figure 7: First test with 25 Mb/s 

Results from these tests are reported in Figures 7 through 8 and following 
observations can be made. 

• Complete sharing with equalization gives always better results than the 
two others methods. 

• Simple complete sharing gives the same results as Equalization when traffic 
from the class with greater bit rate requirement is ten times greater than 
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that of the other traffic class. Over all test cases, Equalization is always 
better, for higher ratios. 

• Routes separation provides in general worse results than complete sharing. 
It is only interesting for the first test and when Ai is much greater (a factor 
of one hundred or more) than A 2 . 

• For all bit rate requirements the results remain consistent with Equaliza- 
tion achieving best performance experiments across all test. 




0.01 0.1 1 10 100 1000 

Offered traffic ratio: A1/A2 



Figure 8: Second and Third tests with 25 Mb/s 
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4.2 155 Mbps links 

Total link traffic is fixed at 155 Mbps for these three experiments. Other 
parameters are described in Table 3. 



Calls 


TEST 1 


TEST 2 


TEST 3 


Class 1 
Class 2 


100 Kb/s => 1 unit 
500 Kb/s => 5 units 


200 Kb/s 2 units 
2 Mb/s => 20 units 


1 Mb/s => 10 units 
3 Mb/s 30 units 



Table 3: Tests with 155 Mb/s 

The results are displayed in Figures 9 through 10. Concerning the tests 
at 155 Mbps, the same conclusions can be drawn. Complete sharing with call 
blocking equalization performs better than the two other methods, while routes 
separation provides the worst results. 




Figure 9: First test with 155 Mb/s. 



5 Accuracy of the Equalization method. 

As seen in section 5 , exact call blocking probabilities can be obtained using 
a recursive solution in the case of the Complete Sharing mechanism. But no 
exact solutions exist for Complete sharing method with equalization. Several 
approximations were proposed in [RMV96]. The most accurate (which is used 
here) is based on the recursive solution with the introduction of the threshold 
notion. Tests were conducted in order to control the accuracy of the approxi- 
mation. A simulation is used to assess the accuracy of the approximation. 
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• First, the call blocking probabilities are observed for varying holding time 
ratios. 

• In a second step, one intends to observe the influence of the relative loads. 

• The last test attempts at justifying the interest of the equalization method. 





Figure 10: Second and Third tests with 155 Mb/s. 

All tests were conducted on one single link of constant capacity C and two 
call classes. All parameters are given in units of bandwidth. 
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5.1 Variation of the Holding time ratio. 

In this part, the ratio of service rates varies from 0.1 to 100. The ratio is 
expressed by The total link capacity is fixed: C = 30 units. 

On each experiment, we compare the blocking probabilities of each class in 
function of the holding time ratio - first without the Equalization mechanism 
(Simple Complete Sharing) and then with this fairness mechanism (Com- 
plete Sharing with Equalization). 

5.1.1 Tests description. 

Two experiments were conducted, as summarized in Table 4- 





Heavy load traffic 


Low load traffic 


Class 0 


Class 1 


Class 0 


Class 1 


Arrival rate 


Ao = 10 


Xi-b- Pi 


o 

II 

o 


Ai = Pi 


Services rate 


Ho = l 


po/ratio | 


P0 = 1 


fiQ / ratio 


Bit rate requirement 


do = 1 


di = 5 


t-H 

II 

o 


di = 5 


Total traffic 


PH = 1.17 


Pi - 0.5 



Table 4: Holding time ratio variation’s parameters 

The two methods have been tested under the same conditions, both using 
the analytical approach and a simulation. 

5.1.2 Influence of the Holding times 

Results for High trafic load and Low traffic load are given in Figures 11 
and 12. Simulations have been obtained by computing 5 million of events. The 
imprecision is lower than 10%, at a 95 %confidence level. The continuous lines 
represent results of the analytical method (the exact one for CS, the approxi- 
mation for CSE), while the points are simulation results. 

The curves suggest the following comments. 

• On the whole, the results are in accordance. This validates the models. 

• The "exact” solution for CS gives constant values for losses as the ratio 
varies, and so does the approximation. However, simulation results show 
a clear influence of the ratio fio/fii, especially for low load. The recursive 
solution with equalization is unable to capture this effect (which is not 
surprizing, since only the loads p* are input parameters for the model). 
The phenomenon has been already reported in [RMV96] and it hcis no 
clear explanation. 
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5.2 Variation of the Offered load ratio. 

In this part, the traffics pi vary in such a way that the overall load is kept 
constant, so as to compare their impact on the efficiency of the mechanisms. 
For this test, the total link capacity (7 = 30 is fixed, the total carried trafic is 
p — 70%. Other parameters are given in Table 6. 





Class 0 


Class 1 


Service rate 


/io = 10 


^1 = 1 


Bit rate requirement 


do = 1 


di = 5 



Table 5: Parameters for the variation of load ratio 

The analytical results are given in Figure 13. The simulation results are 
omitted. Anyway, they are in complete agreement with the analytical ones. 




Figure 13: Influence of the ratio po/ p\. Total Load = 0.7 



5.3 Importance of the approximation. 

To be sure that the Complete Sharing with Equalization method is necessary 
and gives better results than Simple Complete Sharing mechanism, we made 
some others tests. 

• First, the offered load is increased (= total link capacity). 

• Then, the bit rate required by the second class is decreased. 
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• In the last test, the class O’s arrival rate is decreased. 



Simulation and algorithm computing were both made. Results obtained 
with simulations and using model algorithms are similar with an 90% confidence 
interval. Thus, we only show curves using results of model results. 

5.3.1 Parameters’ presentation. 

Parameters for the three tests are given in the Table 5. In this table are 
shown variation to provide less call losses. 

We compare call losses of each class without Equalization mechanism and 
call losses using this fairness method. 





Basic parameters 


Tot. Cap. 


\Ao 


\rfi 


Class 


0 


1 


0 


1 


0 


1 


0 


1 


Ai = 


10 


5 •//! 




= 


to 8 


= 






/i* = 


1 


//Q /ratio 






= 


= 




= 


— 


1 


5 




= 




= 


= 


to 4 


Tot. Cap.: C 


30 


to 35 


= 





Table 6: Heavy load traffic parameters 



5.3.2 Verification results. 

Concerning results for Offered Load increase, bit rate requirement decrease 
and arrival rate decrease see Figures 15, 16 and 17 respectively. 



6 Conclusion and further study. 

The Complete Sharing with Equalization method can achieve gains in carried 
bit rate of 8% compared with the complete sharing method and of 16% compared 
with the Routes Separation strategy. Of all the schemes, routes separations is 
not attractive, not only because of poor performance but also because of an 
increase in resource management complexity and service deployment. 

Complete sharing, easier to implement, exhibits fairness problems under 
certain conditions as evidenced by results obtained from various tests. 

Equalization and Routes Separation provide fair access to the resource. 
Equalization achieves better utilization of the network resource, however. 

CS respects the required GoS {Bi, B 2 less than or equal to 1%) while achiev- 
ing extremely low call block, Hi, for class 1. In some configurations it leads to a 
network utilization as high as Equalization. In all tests Equalization achieves a 
better link utilization but the price to pay is threshold management. The choice 
of acceptance threshold (which is beyond the scope of this paper) is far from 
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0.5 



0.45 \- 




0.15 [■ 



0.05 f- 



0 I I I 1 1 1 

4 4.2 4.4 4.6 4.8 5 

Class 1 bit rate requirement 



Figure 16: Bit rate requirements decrease 



trivial in terms of implementation in a real size network. It amounts to know 
in advance all possible traffic classes, and to set up a GoS management policy 
among these classes. However, this additional complexity cost can be justified 
by the afforded capacity gains. 

These results have been obtained with two different classes. Extensions to 
more than two classes are being pursued. In such configurations, more complex 
policies, in which the equalization principle applies to a subset of the population 
according to a class type selection or GoS level classification, must be derived. 

The present study conducted in isolation considers only a single link. In order 
to use the equalization mechanism for CAC, equalization must be performed 
throughout the network on each stage of the path. 

Future work will address all these issues. 
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Abstract 

In this paper we present an algorithm for making connection admission decisions in 
ATM networks, using measurements made on existing connections and the declared 
parameters of the new connections. Our scheme makes use of the shape-function, a 
concept developed by Botvich and Duffield [BD95], which arises in the application 
of Large Deviation theory to queueing systems. By estimating the shape-function of 
the existing connections, we can make predictions about the effect on the network of 
accepting new connections. Using real traffic collected from a network, we compare 
the performance of this CAC scheme with that of the Mosquito [CLM+97] algorithm 
which is based on estimation of effective band widths. 



Keywords 

Connection admission control, large deviations, shape function, effective bandwidth, 
queueing systems 



1 INTRODUCTION 

Asynchronous Transfer Mode (ATM) allows for statistical multiplexing of traffic 
from many different applications, having widely differing characteristics. This is ad- 
vantageous from the network’s point of view as if affords the opportunity for dramat- 
ically increased utilisation of resources: more applications can be allowed to make 
use of the network than peak rate allocation would suggest if the traffic is buffered at 
switches and multiplexing points during overload periods. The choice of the buffer 
size places a hard bound on the queueing delay and so the problem for the network is 
in deciding how many connections can safely be accepted while keeping the cell-loss 
ratios sufficiently low. 

Connection Admission Control (CAC) is concerned with trying to assess the im- 
pact on the network of accepting a new connection, a problem which has been 
discussed extensively in the literature [MP90, App90, GKK95, Key95, CKR'^91, 




JDSZ97]. One approach which has received recent attention uses the notion of the ef- 
fective bandwidth of a traffic source [Hui88, Kel96, Kel9 1 , GAN9 1 , GH9 1 , CKR+9 1 , 
DLO+94, DLO+95, CLL+95, CHL+95]. This approach is based on the theory of 
Large Deviations, a probabilistic theory of rare events, which, when applied to queue- 
ing systems, can help to quantify the intuitive notion of bandwidth requirement. The 
term ‘effective bandwidth’ refers to a particular function which provides a conserva- 
tive estimate of the bandwidth requirement of a source. This function depends on the 
QoS constraints in a simple manner and on the statistical properties of the traffic in 
a complex manner; if the sources are independent, these functions are additive. 

However, the additive nature of the effective bandwidth function means that it fails 
to reflect economies of scale arising from statistical multiplexing. This is because it 
is based on large buffer asymptotics. An alternative approach [WD97], based on the 
asymptotics associated with a large number of connections, involves estimation of 
the shape-function [BD95] for the multiplexed traffic. In the CAC algorithm intro- 
duced in [CLM+97], the decision to accept or reject a proposed connection is based 
on its declared parameters and on-line estimation of the bandwidth requirement of 
existing connections. For the on-line estimation of bandwidth requirement, a variety 
of estimators may be used. In this paper, we compare the effectiveness of two estima- 
tors: the shape-function estimator [WD97] and the Mosquito estimator [CLM“^97]. 

The practical estimation of bandwidth requirements is a difficult problem, as it 
depends in a complex way on the statistical properties of the traffic. The problem 
is typically approached in two ways. One approach [EMS91, MASR88, FAT94] is 
to assume a parametric model of the traffic and to fit parameters for the connec- 
tion in question. This parameter fitting can be done based on information declared 
by the connection when it requests admission, or measurements made on the traffic 
generated by the connection, or a combination of both. Once the detailed model is 
completed, the estimate can be calculated. However, there are problems with this ap- 
proach. Firstly, unless on-line measurement is employed, the application is required 
to deliver a detailed self-characterisation before it has transmitted any traffic. Further- 
more, given such a characterisation, the network still has to fit parameters to a model 
which adequately describes the traffic source. This may prove a difficult problem; 
the solution of which contains redundant information if what is actually required is 
just a knowledge of the bandwidth requirement. Finally, it is impossible to tell what 
types of traffic it may be necessary to transmit in the future, and it should not be 
required that each new traffic type be submitted to a complex modeling process in 
advance of transmission. 

An alternative approach [GK97, CKR+91, DLO+94, JDSZ97, JDSZ95, Flo96] 
is to attempt to measure the bandwidth requirement more directly. This avoids the 
problem of requiring new traffic types to specify a parameterised model in advance 
and removes the estimation of redundant information. Perhaps most importantly, this 
approach requires very little declared information on the part of the application; in 
the present scheme only declaration of the peak-rate is necessary. However, when- 
ever additional parameters are declared they can be incorporated in the estimation 
of the bandwidth requirement, thereby offering increased efficiency. Hence, this ap- 
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proach can use a model estimated from declared parameters, if one is available, as 
an initial estimate of the bandwidth requirement, which is then refined by on-line 
measurement. However, in the absence of such a model, a more conservative initial 
estimate can easily be made from the declared peak-rate. 



2 THEORETICAL FRAMEWORK 

Our central concern is the loss of cells due to overflow at a buffer. Consider a mul- 
tiplex of N ATM streams arriving at a buffer which has finite storage capacity B. 
Cells are removed from the buffer at fixed rate 5, the line-rate. Each traffic stream 
has a finite duration, as might be expected for calls of finite length. Associated with 
each sample of traffic is a cell-loss ratio between zero and one (that is, the ratio of 
cells lost to those that arrive); we denote the cell-loss ratio for a multiplex of N lines 
with a buffer-size B and a line-rate S by CLR(AT, B,S). Experience with a wide 
variety of traffic-sources shows that the logarithm of CLR( A, 6A, sN) is asymptot- 
ically linear in the number of sources N if the line-rate per source s and buffer size 
per source b are fixed. A typical example is shown in Figure 1 which plots, for a 
set of motion JPEG sources on the Fairisle ATM network at Cambridge [BLM94], 
the logarithm of the observed cell-loss ratio against number of sources N when the 
buffer size and line-rate are scaled appropriately. This demonstrates the multiplexing 
gain available in shared resource systems due to the statistical properties of the indi- 
vidual traffic streams. For example, if one doubles the number of (identical) sources 
to be multiplexed, one need not, generally, double the rate and buffer size in order to 
maintain the same CLR. 

The general features of a plot of the logarithm of cell-loss ratio against number of 
sources are explained by queuing-theory. We model the arrival streams as stationary 
stochastic processes {A^}, the arrivals processes; here denotes the total number 
of cells which have arrived up to time t from source n. The following scaling be- 
haviour of queue-tail probabilities has been shown to hold for a very general class of 
traffic models [BD95, CW96, Duf96]. When N sources, satisfying certain assump- 
tions detailed below, feed a buffer of size Nb which is being served at rate Ns the 
proportion of cells lost will satisfy the logarithmic asymptotic 

logCLR{N, Nb, Ns) -NI{b), as iV -> cx), (1) 

where the shape function I{b) depends on s and the detailed traffic characteristics. 
Although equation (1) describes the behaviour of the CLR as N goes to infinity, we 
would expect that log CLR( A, Nb, Ns) ^ —NI (b) for N large, but finite. One use- 
ful feature of (1) is that it does not require that b, the buffer allocation per source, 
be large. Thus it can be used to describe cell-level, as well as burst-level, queue- 
ing behaviour. In this, it is distinguished from the large body of results about the 
asymptotic behaviour of tail probabilities for large 6, and the consequent effective 
bandwidth approximation. Another useful feature of (1) is that it holds for a wider 
class of traffic than the corresponding result for large b asymptotics. It does not re- 
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Number of sources 

Figure 1 Empirical loss probabilities for samples of JPEG coded video. 

quire that the traffic be mixing and, thus, the estimation based on it, described in this 
paper, should be valid for long-range dependent traffic [Duf96, Duf97]. 

The main condition we require of the multiplex of arrivals processes is that the 
limit of the finite-time cumulant generating function (CGF) per source exists for 
each time-scale t as iV goes to infinity: 

At(0) := lim ^ logEe® . (2) 

N-^OO 

The finite-time CGFs are related to the effective bandwidth of the sources: a(6) = 
6~^ limt^oo We shall discuss effective bandwidths more fully in section (5). 

The assumption above is satisfied by i.i.d. superpositions and also by heteroge- 
neous superpositions where the proportion of each type of source is held constant. In 
the case of independent heterogeneous superpositions of J types of sources indexed 
by j E {1, . . . , J}, we have that (^) where pj is the proportion 

of sources of type j and are the finite-time CGFs of a source of type j defined 
analogously to (2) as 

:= (3) 

Thus, when we perform estimation based on (1) we need not assume that the traffic 
sources are identical. 
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Under the above assumption, (1) holds with the shape function I given by 
I{b,s) = + st), (4) 

where /*(x) := maXy^R{xy — f{y)} is the Legendre- Fenchel transform of the 
function /; see [BD95, CW96, Duf96]. The time r = argmint>o(fAt)*(6 + st) at 
which the minimum above is attained is called the critical time-scale and may be 
interpreted as the most likely time-scale on which the buffer overflows. The theoret- 
ical result above offers an explanation of the observed asymptotic behaviour of the 
logarithm of the cell-loss ratio as a function of the number of sources and relates 
the slope of its linear asymptote to the CGF of a stochastic process representing the 
traffic. 

The bandwidth requirement of a sample of traffic is defined to be the minimum 
line-rate at which a target cell-loss ratio c is not exceeded in a buffer of storage 
capacity B : 

BWR(AT, B, c) := min{5 : CLR(AT, B, S)<c}. (5) 

Notice that this is an operational definition which does not involve any statistical 
theory; for a given trace, it can be determined empirically by trial and error. 

We are interested in estimating the bandwidth requirement of the multiplex. Sup- 
pose we have some estimate I of the shape-function I. Then we may use this estimate 
to obtain an estimate of the CLR for any value of S\ 

CLR(iV,B,5) := 

This leads to a natural estimate of the bandwidth requirement: we adjust S until our 
estimated CLR just matches the target CLR: 

:= min{5 : < c}. 

In the next section we shall explain how to estimate the shape-function. 



3 ESTIMATING THE SHAPE-FUNCTION 

Given a sample realization {Xi,X 2 , . . •} of a traffic stream we may estimate the 
finite-time CGFs of its source as follows. First form all blocks of length t\ 



Xx ~Y,Xu 

i=l 



t+1 

i-2 
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Assuming stationarity of the arrival process, we use these overlapping blocks to get 
an estimate of Xt by replacing the expectation in (3) with an empirical mean: 

Xt(e) := 

k=l 



where K is the number of blocks formed. We assume that the sources are indepen- 
dent and so we may combine the estimates to form an estimate of Xt : 

J=1 

Then for each t we merely perform the minimisation and Legendre transform, mir- 
roring (4), in order to form the estimate 

i{b) := imn{tXty{b -h st). 

It is worth distinguishing this procedure from that of section 5 where, for some 
(large) T, Xt will be used to estimate the limiting CGF, A(^) = limt_^oo Xt{0), 
and hence estimate the effective bandwidth by d{6) := 6~^Xt{0). There is a trade- 
off here in choosing a value for T: too small a value and Xt(0) will not be close 
enough to X{9), too large a value and the variance of the estimator will be large 
(see [Gan96]). Also, it is difficult to automate the choice of block size. However, the 
shape-function estimator does not suffer from this problem: the value of t where the 
minimum is empirically observed to occur provides an estimate of the critical time- 
scale r, so that this method automatically picks out the time scale relevant to buffer 
overflow. 



4 THE CAC ALGORITHM 

In this section we summarise the description, given in [CLM"^97], of a practical 
CAC algorithm which makes use of on-line estimation of bandwidth requirement. A 
measurement based CAC algorithm will, in general, need to combine measurements 
on the current multiplex with declared parameters from the new call request in order 
to judge whether there is sufficient bandwidth available to satisfy the QoS require- 
ments. The task of measuring the bandwidth requirement of the current traffic mix 
is performed by the estimator. In this paper, we compare the effectiveness of several 
estimators, making use for the first time of the shape-function estimator: 

BWR(iV6,c) := min{sAT : < c}, 

where c is the target CLR. For the new call there is no traffic record available and so 
the CAC algorithm must base its estimate on the the call’s declared parameters. These 
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Figure 2 Operation of the CAC algorithm. 
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are passed to the predictor. In this paper we consider a predictor which requires a 
new call request to declare only its peak rate, however it is possible to use further 
information such as the GCRA parameters defined by the ATM forum [ATM95]. 

Figure 2 shows the behaviour of the CAC algorithm. Given a current multiplex 
of calls, the system estimates the bandwidth requirement of the multiplex using its 
chosen estimator (depicted here by a thermometer) and the declared parameters of 
the connection request. If the estimated bandwidth requirement is less than the link 
capacity, then the connection can be accepted without violating the QoS of any calls. 
As soon as the new call commences, the estimator uses measurements of the new 
multiplex to revise its estimate of the current bandwidth requirement. When the next 
call attempt arrives, the procedure is repeated, as shown. If a new call attempt arrives 
before the algorithm has developed an accurate estimate of the new bandwidth re- 
quirement the algorithm acts conservatively. It uses the most recent stable estimate 
of the bandwidth requirement, plus the sum of the peak rates of all subsequently 
admitted calls. 



5 THE MOSQUITO ESTIMATOR 

In this section we review the Mosquito estimator proposed in [CLM“*"97]. Unlike 
the shape-function estimator, which is based on the large N asymptotics of the sys- 
tem, the Mosquito estimator is based on the large buffer asymptotics. The crucial 
observation is that for stationary and mixing arrivals processes the loss ratio decays 
exponentially with buffer size: 

CLR(J5,S) 
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Since the number of sources being multiplexed is now constant we have dropped the 
reference to N. The decay rate S is determined by the line-rate and by the CGF of 
the multiplex: 

S{S) = max{9 : A(<9) < SO}. 

A may be estimated by choosing a block length T large enough so that the aggregate 
arrivals At = J2n=i multiplex are approximately independent; in this 

case 



X{0) := lim \t{0) « AtW :=hogEe^^-. 

t— >oo i 

As before, the finite-time CGF Xt is estimated using the empirical distribution of 
the blocked arrivals 

i=l 

where X{ are the activities in each block. Using the estimate of A we can estimate 
the bandwidth requirement; because of the properties of A this takes the particularly 
simple form 



BWR(B) := min{5 : -6{S)B < logc} = 



where S* = — logc/.B. The function X{d)/6 is the effective bandwidth function 
of the arriving traffic stream and the estimator based on it is called the effective 
bandwidth estimator. 

In practice the effective bandwidth estimator has been found to be too pessimistic. 
This is because, for finite buffer size, the approximation CLR(6) « exp{—B6) is 
not exact. An improvement can be made by introducing a pre-factor so that 

the estimate of the bandwidth requirement becomes 

BWR(R) := min{5 : -/i(5) - 5{S)B < logc}. 

5 is estimated as before and /i is estimated by simulating the system and observing 
the cell loss in a buffer of size zero. 

For a simple Markovian model. Figure 3 shows the simple and refined effective 
bandwidth approximations, and also the shape-function approximation described in 
the previous section. These are compared with the true value of the CLR produced 
from simulations. 
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Figure 3 Estimates of the CLR for Markovian traffic using three different approxi- 
mations plotted against B for fixed N and S. 



As well as using the refined effective bandwidth approximation the Mosquito 
estimator also uses knowledge of the peak rate to make the estimation of A more 
robust — see [CLM“''97] for details. 



6 SIMULATION RESULTS 

This section presents the results of simulation experiments using the CAC algorithms 
of Sections 4 and 5. The aim of the experiments was to evaluate the performance of 
our approach with respect to several criteria which we discuss below. 

Firstly, we are interested in the performance of each algorithm in terms of the re- 
source utilisation it achieves. A pessimistic CAC algorithm which allocates resources 
using the declared peak rate of each source can guarantee that the loss constraints will 
always be met. Our algorithms, however, attempt to increase the link utilisation by 
admitting as many calls as possible whilst still maintaining the QoS guarantee. We 
thus need to compare our approach with the pessimistic system, and with a system 
which is optimal in the sense that the CAC algorithm is assumed to have complete 
knowledge of the statistical properties of every connection requesting admission. 
The optimal CAC should achieve maximal link utilisation while ensuring that QoS 
constraints are met. Secondly, we are interested in the effectiveness of each CAC 
algorithm in terms of its ability to guarantee the QoS constraints of the traffic. 

Simulation Model. In each of our simulations we model a single output buffer and 
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transmission link of an ATM switch. The link speed used is 100 Mb/s, which corre- 
sponds to the TAXI transmission rate for the Fairisle ATM network at Cambridge. 

The traffic source we use is one which has been widely used in performance stud- 
ies of ATM systems in the literature, namely a trace of the activity of the Star Wars 
movie. The Star Wars data set was produced by Garrett and Vetterli at Bellcore, and 
has been studied in some detail, for example in [GW94, CLG94] and [CLL+95]. It 
comprises the information content in bytes per frame for about 2 hours of the film, 
as transmitted by a DCT based codec similar to JPEG. The byte count per frame is 
broken down into “slices” of 16 video lines. With 30 slices per frame and 24 frames 
per second, one slice represents the information transmitted in about 1 .4 millisec- 
onds. The Star Wars traffic has been shown to exhibit signs of long-range dependent 
behaviour, making it potentially difficult for a measurement based CAC to cope with. 

The Star Wars traces were constructed by encoding each slice as a single AAL5 
PDU which was transmitted (and traced) over Fairisle; cells from each slice were 
transmitted at a CBR rate equal to the slice rate, reducing the peak rate from the line 
rate to about 24.2 Mb/s for the worst slice. The mean rate is about 5.3 Mb/s. The 
buffer is sized according to the requirement that no frame would be delayed more 
than a frame time; this results in a maximum buffer size of 500 cells. We used a CLR 
constraint of 10“^ for all of the results presented in this paper. 

Call Model. We study a scenario in which calls of a particular traffic type arrive 
according to an exponential inter-arrival time distribution, an assumption which ap- 
pears to be well founded [PF94]. In the absence of real-world data we have used call 
lengths which are exponentially distributed. Calls arrive at a high (Poisson) rate with 
mean 5 calls/s. Blocked calls are lost, but the high arrival rate means that the system 
is continually faced with new call attempts. We thus expect the system to remain 
close to maximum utilisation. 

Calls have an exponentially distributed length with a mean length of 60 seconds. 
Each accepted call transmits a trace which is derived by randomly selecting a start 
point in the Star Wars movie. While it is possible that two connections may simul- 
taneously read from the same part of the movie, this will only happen very rarely. 
Hence, we do not expect the resulting correlations between calls to significantly af- 
fect our results. 

Results. For each of the CAC algorithms discussed. Figure 4 shows the distribution 
of the number of connections in progress during the simulation. The left-most his- 
togram was made using peak-rate admission control; this algorithm admits to the 
system an average of 7.92 connections. From left to right, the other three histograms 
were made using the simple effective bandwidth algorithm, the shape-function al- 
gorithm, and the Mosquito algorithm. The advantage gained by exploiting statistical 
multiplexing is clearly apparent: the CAC algorithm, using any of the three estimat- 
ing techniques, admits significantly more calls than the peak rate allocation scheme. 
As was found in [CLM+97], the Mosquito estimator, which makes use of the inter- 
cept jjL, is less conservative than the simple effective bandwidth estimator. In terms 
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Figure 4 Histogram of the number of calls in progress over the simulation. The 
mean number of calls in progress under each of the admission schemes were: 7.92 
under peak rate, 13.0 under simple effective bandwidth, 13.6 under shape-function, 
and 14.6 under Mosquito. 



of the number of connections admitted, the performance of the shape-function es- 
timator lies between the other two estimators. This is consistent with Figure 3; we 
are thus led to believe that it is the different approximations involved in making the 
estimates which gives rise to the differences in performance. 

To obtain bounds on an optimal admission scheme we multiplexed as many seg- 
ments of the source traffic as possible such that the CLR constraint was met over the 
entire experiment. In other words, we found empirically the number of calls which 
could be multiplexed for a given BWR (the link-rate) and CLR as defined in Equa- 
tion (5). We found this number to vary between 14 and 16 depending on the sample 
of traffic used. It is clear that the three algorithms, particularly the Mosquito algo- 
rithm, perform very close to optimally. 

Turning to the distribution of cells lost under the three admission schemes we 
find that no loss occurred when using the Simple Effective Bandwidth or the Shape- 
Function algorithms. This indicates that the Simple Effective Bandwidth algorithm is 
too conservative: greater link utilisation can be achieved by using the Shape-Function 
algorithm without in any way compromising QoS. Since the Mosquito algorithm 
admits more connections than either the Simple Effective Bandwidth or the Shape- 
Function algorithms we might expect this algorithm to display a higher CLR. We 
find that this indeed is the case. Mosquito lost 0.0032% of the cells over the length 
of the simulation which is very close to the target CLR of 0.01%. However this ob- 
scures the fact that most connections experience no cell loss while others lose many 
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Figure 6 Distribution of CLR per connection under Mosquito. The CLR over the 
length of the experiment was 3.2 x 10“® and 4.05% of calls lost more than the target 
CLR of 10-^. 






cells. Overall, 4.05% of the connections lost more than the target CLR under the 
Mosquito admission policy. Figure 5 shows the cell loss ratio for each call, against 
call number, and Figure 6 shows the distribution of the CLRs of individual calls. It 
is apparent that cell loss occurs in bursts. This is the result of our cell loss allocation 
scheme: when the buffer overflows an equal number of cells are discarded from each 
connection that is in progress at that time. 

Conclusions and Future Work. We have seen how the CAC algorithm described 
in [CLM"^97], using any one of three estimators, can significantly improve on peak 
rate allocation. The shape-function estimator has been found to perform better than 
the simple effective bandwidth estimator but not as well as the Mosquito estimator in 
terms of the number of connections admitted. However, unlike Mosquito, the shape- 
function algorithm did not cause any calls to exceed their target CLR. 

A possible improvement to the shape-function estimator would be to combine its 
estimates of the shape-function with observations of cell loss in a simulated buffer 
in a manner similar to the way in which the Mosquito algorithm uses estimates of 
fi. The cell loss under such a scheme will require investigation. Alternative methods 
of measuring /jl in the Mosquito algorithm have been suggested and deserve further 
attention. The performance of all three algorithms under different assumptions about 
the connection arrival process, and using a greater variety of traffic types, needs to 
be examined. 
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Abstract 

This paper describes a connection admission control (CAC) algorithm for 
ATM networks supporting different Quality of Service (QoS) classes, and il- 
lustrates its effectiveness with simulation results considering both the call-level 
and the cell-level dynamics. 

The CAC algorithm groups connection requests in three different QoS 
classes: i) Class 1: with stringent CLR (Cell Loss Ratio) and CDV (Cell Delay 
Variation) requirements; ii) Class 2: with stringent CLR requirements, but no 
need for CDV guarantees; iii) Class U: with no need for guarantees on either 
CLR or CDV. Both Constant Bit Rate (CBR) and Variable Bit Rate (VBR) 
connections can request admission as either Class 1 or Class 2, depending on 
their QoS requirements. Unspecified Bit Rate (UBR) and Available Bit Rate 
(ABR) connections instead normally request admission as Class U. 

The investigation of the effectiveness of the CAC algorithm is based on the 
simulation of an ATM network with parking lot topology, and with variable 
parameter values. 

The relationship between the proposed CAC algorithm and equivalent band- 
width (EB) CAC algorithms described in the literature is discussed. 
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1 INTRODUCTION 



In competitive telecommunication markets, the success of a network operator 
largely depends on two factors: i) the Quality of Service (QoS) provided to the 
end users, and ii) the cost of the service utilization for the end users. These 
two factors on the one hand determine the end user satisfaction, and on the 
other hand are determined by the effectiveness of the algorithms adopted by 
the network operator for the exploitation of the network resources. 

In order to leave space for the competition among network operators, stan- 
dardization bodies in many cases refrain from the standardization of those 
algorithms whose impact on the efficient utilization of network resources is 
crucial. Several examples of such algorithms are possible, and some of the 
most important surely concern algorithms for the implementation of the Con- 
nection Admission Control (CAC) functions. 

The goal of CAC functions is achieving the best possible exploitation of 
network resources, while guaranteeing that the QoS remains above the level 
promised to the end user. CAC functions thus play a central role in guaran- 
teeing the customer satisfaction. 

In the particular case of ATM networks, the role of CAC functions is even 
more delicate than in traditional circuit-switched networks, since the variabil- 
ity of the telecommunication services offered by the network implies a very 
wide range of data rates and several QoS requirements. 

Several different proposals of CAC algorithms for ATM networks appeared 
in the literature; the most widely used approaches are based either on the 
definition of an equivalent bandwidth for each connection, or on the actual 
measurement of the bandwidth used by active connections [1, 2, 3, 4, 5]. 

Although it is intrinsically impossible to define the optimality of a CAC 
algorithm for ATM networks, some requirements for a “good” CAC algorithm 
are evident. A good CAC algorithm for ATM networks should 



1. achieve high resource utilization; 

2. allow the exploitation of statistical multiplexing; 

3. guarantee the QoS promised to the end user; 

4. be robust with respect to traffic fluctuations; 

5. be based on simple traffic descriptors; 

6. be simple to implement; 

7. provide immediate answers. 



In this paper we illustrate and evaluate by simulation a simple CAC algo- 
rithm for ATM networks that explicitly considers the presence of connections 
with different QoS requirements. 

QoS requirements are confined to a set of predefined classes, and the traffic 
descriptor used by the CAC is independent from the link or node charac- 
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teristics, so that there is no need to fine tune the algorithm when network 
characteristics change. 

The proposed CAC is suitable for an adaptive implementation since its 
behavior depends on very few parameters, whose physical significance is easily 
understood and can be modified according to cell-level QoS measurements. 
Adaptive versions of the CAC algorithm, however, are not discussed in this 
paper, and are left for further study. 

The paper is organized as follows. The CAC algorithm is described and 
discussed in Section 2. The simulation tools used for the assessment of the 
effectiveness of the CAC algorithm are described in Section 3, while the net- 
working scenario that is used as a testbed for the simulation experiments is 
presented in Section 4. Simulation results are presented in Section 5. Finally, 
Section 6 ends the paper with some concluding remarks. 



2 THE CAC ALGORITHM 

The key goal of the CAC algorithm is to decide about the admission of con- 
nections with different QoS requirements, using a set of simple equations and 
a very limited amount of information, namely only the peak cell rate (PCR) 
and sustainable cell rate (SCR) of admission- requesting connections. 
Specifically, three QoS classes are considered: 

Class 1: with stringent CLR (Cell Loss Ratio) and CDV (Cell Delay Varia- 
tion) requirements; 

Class 2: with stringent CLR requirements, but no need for CDV guarantees; 
Class U: with no guarantees on either CLR or CDV. 

The QoS as defined by ITU [6] is described through a fairly complicated 
set of parameters, each one ranging on a large number of possible values. It is 
clear that the simple 3-class QoS scenario we are assuming cannot support the 
complete set of QoS vectors defined in [6]. It must be argued, however, that 
most users and applications cannot be expected to be capable of describing 
the QoS they need in such fine detail, and most networks will probably not 
offer the whole range of possible QoS alternatives. 

The mapping of ATM tranfer capabilities, namely Constant Bit Rate (CBR), 
Variable Bit Rate (VBR), Unspecified Bit Rate (UBR), and Available Bit Rate 
(ABR), onto QoS classes is not trivial. However, one can argue that CBR and 
real-time VBR (rt-VBR) should be admitted as Class 1, non-real-time VBR 
(nrt-VBR) as Class 2, and best-effort services like ABR and UBR as Class 
U. CBR connections can, in principle, require admission as Class 2; however, 
a CBR connection that is not guaranteed a low CDV probably will not ap- 
peal to many users. Therefore, we may safely assume that only nrt-VBR calls 
request admission as Class 2. 
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The CAC algorithm is run for each link that a connection should use; the 
rule to decide about the admission of a connection is quite simple: 

• For each link, Class 1 and 2 connections are accepted if: 

^PCR + ^MfCR< aC (1) 

Cl C2 

where C\ and C2 are the sets of connections in Classes 1 and 2, respectively, 
C is the link capacity, and a is a protection coefficient (o < 1) that can be 
set so as to avoid that the whole link capacity is used by connections of 
Classes 1 and 2. MfCR stands for Modified Cell Rate, and is a parameter 
characterizing connections with a cell rate value midway between PCR and 
SCR, according to a factor 7; that is: 

MfCR = SCR + t(PCR - SCR) (2) 

Since connections can exhibit a substantially different behavior for different 
degrees of burstiness, picking a similar range of 7 when deciding about the 
admission of more or less bursty connections can result in a misleading 
characterization of their behavior; 7 should thus be regarded as a function 
of the burstiness. In our CAC scheme, we introduce a simple relationship 
between 7 and the connection burstiness (computed as B = PCR/SCR), 
in the form of 7 = jolB. “Sensible” values for 70 range between 0 and 
B (clearly, 70 < 0 would result in bandwidth under- allocation, and 70 > 
B would entail a bandwidth over-allocation for bursty connections, thus 
reducing link utilization). Thus we have: 

MfCR = SCR + ^(PCR- SCR) (3) 

B 

A discussion of the relationship between the expression of MfCR with fre- 
quently used equivalent bandwidth expressions is included in the following 
subsection. Incidentally, note that the value of 70 does not directly affect 
the admission of CBR connections, for which PCR=SCR. 

• Class U calls are accepted if on each link: 

^PCR</? [C- ^ PCR- ^ SCRj (4) 

Cu \ ScBB. SvbK / 



where Cu is the set of connections in Class U, 5cbr and 5vbr are the sets 
of CBR and VBR connections, and /? is a bandwidth utilization coefficient, 
possibly greater than 1, affecting the amount of leftover bandwidth that 
can be allocated to Class U calls. 
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As can be understood from the above description, a and (3 are CAC pa- 
rameters independent from the traffic mix, whereas 70 is more closely related 
to the characteristics of bursty connections. 



2.1 Modified Cell Rate and Equivalent Bandwidth 

The expression of MfCR in equation (3) is based only on the PCR and SCR 
traffic descriptors, which are provided by the user with the request of connec- 
tion setup. 

In order to relate the MfCR definition with frequently used equivalent band- 
width definitions, consider the case of ON/OFF VBR source models, according 
to which users have fixed cell generation rate p = PCR during ‘ON’ periods, 
and are silent during ‘OFF’ periods. 

The average offered cell rate of such users is m = SCR, that can be expressed 
as: 



m = p P(ON) (5) 

where P(ON) is the probability of sources being in the ON period. 

Moreover, for such user models, the variance of the offered cell rate can be 
written as: 

= p^ P(ON) — w? = p^— — w? — mp — = m{p — m) (6) 

Hence we have: 

SCR = m (7) 

2 

(PCR - SCR) = p-m=— ( 8 ) 

m 

and, recalling that B —pjm^ from (3) we get: 

MfCR ==m-|-^— = m-|- — (9) 
B m p 

The CAC defined in (1) in this case states that VBR connections are ac- 
cepted if: 



(10) 



V MfCR < aC 
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that is, if: 






(11) 



This shows that with this type of sources the proposed CAC algorithm 
based on MfCR exhibits significant similarities with frequently used CAC 
algorithms based on equivalent bandwidth (EB) expressions, such as the one 
proposed by Lindberger (see Chapter 5 in [4] and [5]), in which the connection 
is accepted if: 



Y^EB<C 



( 12 ) 



with: 



EB = am + —cr^ 

O 



(13) 



where C is the link capacity and a, h are coefficients whose values Lindberger 
evaluated so as to achieve a cell loss probability smaller than a given threshold, 
and empirically found to be well matched to the following formulas: 



a = 1 — 



logio Pio 

50 



= 1 



50 



(14) 



&Ll 

b = -QaLp = - 6 ip 

^50 ^ 



(15) 



with Lp = \ogiQ Floss, and Pioss the maximum acceptable cell loss probabil- 
ity. In his analysis, Lindberger had to adopt a model including a number of 
simplifications, the most relevant being the assumption of a superposition of 
an infinite number of sources generating fixed-length and fixed-rate bursts 
according to a Poisson process, and the presence of a bufferless queue. 

The advantage of Lindberger ’s expression is that it allows to explicitly ac- 
count for a cell loss probability requirement, but the similarity of the CAC 
rules based on MfCR and EB allows the selection of a value for 70 such that 
the same performance requirement is met. 

Indeed, we can observe that, with the assignment: 



1 

a = - 

a 



(16) 



To = 



bp 



(17) 
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(11) and (12) become identical. 

Simulation results will be used in later sections to verify whether this ap- 
proach is actually capable of guaranteeing the expected upper bound on the 
cell loss probability in realistic networking environments. 



3 THE SIMULATION TOOLS 

The investigation of the effectiveness of the CAC scheme we just described 
is possible at both the cell level and the call level by using an integrated 
simulation environment for ATM networks developed at Politecnico di Torino, 
under contract with CSELT. 

Simulations are run by two distinct simulation tools working at the two 
different time scales: namely, ANCLES (Atm Networks Call LEvel Simulator) 
and CLASS (Cell Level Atm Services Simulator). Either simulator computes 
metrics relevant to the time scale it is working at: while ANCLES allows the 
estimation of call blocking probabilities and average link loads for different 
CAC and routing schemes, CLASS addresses such performance parameters 
as cell loss probabilities or cell delay distributions, beside assessing the effec- 
tiveness of traffic management techniques such as shaping, policing and ABR 
algorithms. 

In our integrated simulation environment, call-level dynamics are explored 
by ANCLES; when a “critical” configuration in terms of congestion level and 
duration is found, its traffic pattern is fed to CLASS, which carries on the 
analysis at the cell level. 

However, the simulation control is never actually switched back and forth 
between the two simulators: the intervention of an operator, selecting the 
most “critical” configurations, is advisable, given the cost in terms of CPU 
time involved by cell-level simulations. Therefore, a set of ANCLES runs is 
usually followed by a number of CLASS simulations, according to the opera- 
tor’s choices. 

A more detailed description of ANCLES/CLASS interactions can be found 
in [9], 



4 THE SIMULATED SCENARIO 

The setting we chose to investigate the behavior of the CAC scheme described 
in Section 2 is quite simple. 

Fig. 1 depicts the ATM network configuration under examination. A net- 
work topology of the type usually referred to as parking lot is used. The 
capacity of all user-node and node-node links is set to 150 Mbit/s, while the 
length of each link varies so as to equalize the end-to-end delays between 
sources and destinations. 

As shown in Fig. 1, each node (numbered 1 through 4) is linked to 3 call 
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^ CBR Q VBR Q TCP 

Figure 1 Simulated ATM network configuration 



generators — or users (the circles in Fig. 1). Of course, the exact number of 
connections opened by the users connected to a node (possibly zero) varies as 
the call-level simulation experiment proceeds, and is determined by both the 
offered traffic load and the CAC algorithm; however, users behave as concen- 
trators and the number of links connected to each node remains constant. 

Three types of users are considered: CBR, non real-time ON/OFF VBR, 
and Best Effort. One user of each type is connected to nodes 1,2, and 3, Best 
Effort users are modeled as greedy TCP traffic generators. The traffic gener- 
ated by TCP users is either shaped, or controlled according to the principles 
of the ABR ATM transfer capability. CBR users activate connections with 
constant bit rate (2 Mbit/s), and with holding times determined by i.i.d. ex- 
ponential random variables averaging at 1000 seconds. ON/OFF VBR users 
open calls with PCR equal to 10 Mbit/s; once activated and admitted to the 
network, VBR calls hold for an exponentially distributed period, with aver- 
age 1000 seconds; the SCR of VBR connections is variable: results are derived 
for the cases SCR = 1 Mbit/s, and SCR = 100 kbit/s, corresponding to the 
burstiness values R = 10 and B = 100, respectively. The execution of call- 
level simulation runs requires no other information; however, when cell-level 
simulations are run, the characterization of the ON/OFF VBR sources is also 
needed: the durations of ON periods is taken to be exponentially distributed 
with average /ioN = 50 ms; the OFF periods duration is also exponentially 
distributed, with mean ^off = (^ “ 1) • A^on- 

For what concerns TCP users, each node is requested to support connections 
with 10 Mbit/s PCR and average holding time equal to 60 seconds. TCP users 
are assumed to activate long ftp file transfers that last for the whole connection 
holding time. However, the actual average transfer rate on all TCP connections 
is determined by the TCP protocol dynamics and by their interaction with 
CBR and VBR traffic, that have a higher priority. 

The data transferred by each connection, regardless of the source position. 
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are routed to a sink connected to node 4, creating a bottleneck on the link 
between node 3 and node 4. 

The allocation of connections within QoS classes generally depends on the 
QoS requirements of the end user. For the sake of simplicity we assume that 
all real time connections are CBR, while all non real time connections are 
VBR. For this reason, in the simulation runs that produced the results that 
will be discussed below, all connections generated by CBR users are mapped 
onto Class 1, all VBR connections are mapped onto Class 2, and all TCP 
connections are accommodated in Class U. The traffic load offered by Class 
1 calls is twice that offered by Class 2 calls. The nominal load of TCP con- 
nections is instead quite hard to define: TCP connections declare only their 
PCR, but the average traffic generated by them during cell-level simulations 
depends on the network load and the transfer capability supporting Class U 
calls, as already noted. 

For what concerns parameters of interest to cell-level simulations, ATM 
switches are assumed to have an output queued, non blocking architecture 
and each output interface comprises three separate output buffers for cells 
belonging to connections of Classes 1, 2, and U. A fixed priority scheme is 
employed to serve buffers. Class 1 being the highest priority, Class U the low- 
est. The buffer size for Class 1 traffic is equal to 64 cells, since high priority, 
real time traffic needs buffer space only for the resolution of cell scale con- 
tention; the buffer size for Class 2 and Class U traffic is equal; two different 
values are used in simulation experiments: either 1024 or 2048 cells. 



5 NUMERICAL RESULTS 

The CAC parameters used in our simulations are a = 1 , and /? = 4, while 
different values of 70 are examined. 

The nominal load cumulatively offered by Class 1 and Class 2 traffic, nor- 
malized with respect to the capacity of the congested link is used as an inde- 
pendent parameter for the call-level simulations. As already pointed out, the 
load offered by TCP connections cannot be estimated a priori, however the 
TCP call generation rate is set so as to overload the network in almost any 
condition, since the use of a large value of fS accounts for a very permissive 
admission policy for best-effort, Cl 2 iss U traffic. 

The simulation results we collected at the call level can be divided into two 
sets: 

• curves of the blocking probabilities for CBR, VBR and TCP connections, 
versus the nominal load offered to the bottleneck link by CBR and VBR 
connections 

• curves of the utilization of the bottleneck link by CBR and VBR connec- 
tions, versus the nominal load offered to the bottleneck link by CBR and 
VBR connections 
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After having chosen the nominal loads that result in blocking probabili- 
ties approximately equal to a target value (set to 0.01 in our simulations), 
some critical congestion configurations are saved by the call-level simulator; 
simulations at the cell level then provide the following results: 

• CDV, CLR and average load of the bottleneck link for CBR and VBR 
connections; 

• throughput, goodput and CLR for TCP connections. 




Figure 2 Blocking probabilities vs. normalized offered load; 70 2, B = 10 

The first set of results (Figs. 2,3,4 and 5) shows call blocking probabilities 
referring to CBR, VBR and U Class connections, as a function of the traffic 
load offered to the bottleneck link by CBR and VBR connections normalized 
with respect to the link capacity (150 Mbit/s). The four different figures are 
derived for variable values of the burstiness of VBR connections {B = 10 and 
B = 100), and of the parameter 70 (70 = 2 and 70 = 4). 

With 70 = 2 (Figs. 2 and 3), the blocking probabilities for U Class connec- 
tions are significantly larger than for CBR and VBR connections. Instead, the 
blocking probabilities for CBR connections are smaller than for VBR connec- 
tions when the latter have burstiness B = 10, but the opposite is true when 
the burstiness of VBR connections grows to R = 100. This is due to the fact 
that with B = 10 the MfCR of VBR connections is larger than the PCR of 
CBR connections, but it becomes smaller for B = 100. 

It can be noted that blocking probabilities for CBR and VBR connections 
with different burstiness appear to reflect the bandwidth allocation deter- 
mined by (1), resulting in blocking probabilities for CBR almost ten times 
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Figure 3 Blocking probabilities vs. normalized offered load; 70 = 2, B = 100 

higher than VBR when B = 100 (this can easily be explained if we consider 
that PCRcbr — 2 Mbit/s and MfCRvBR — 0.298 Mbit/s). 




Load 



Figure 4 Blocking probabilities vs. normalized offered load; 70 = 4, B = 10 

With 7o = 4 (Figs. 4 and 5), the blocking probability for U Class con- 
nections is reduced to values smaller than for CBR and VBR connections. 
However, increasing the value of 70 entails two significant drawbacks. First of 
all, increasing 70 leads to increased blocking probabilities for CBR and VBR 
connections, as seen by the comparison of Figs. 2 and 4, or Figs. 3 and 5. Sec- 
ond, increasing 70 results in a lower utilization of the bottleneck link by CBR 
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Figure 5 Blocking probabilities vs. normalized offered load; 70 = 4, B = 100 



and VBR connections (the traffic that should generate the largest percentage 
of revenues for the network operator). This can be seen in Figs. 6 and 7, that 
show curves of the bottleneck link utilization by CBR and VBR connections 
versus the nominal load offered to the bottleneck link by CBR and VBR con- 
nections, for values of 70 ranging from 0 to the value of the burstiness of 
VBR sources (recall that for VBR connections, with 70 = 0, MfCR=SCR; 
instead, with jq = B, MfCR=PCR). The curves show that almost half of the 
available capacity, and sometimes more, remains unused, unless the chosen 
value for 70 is quite small, close to 70 = 0. Note however that the proposed 
CAC algorithm can be seen to be capable of providing much better utilization 
of the bottleneck link than a PCR-based CAC (curves with jq = B). 

Achieving a high utilization of the bottleneck link is just half of the task 
of a CAC algorithm; the second half of the task is guaranteeing the QoS of 
connections. In order to verify that this goal is also achieved, the simulation 
of the ATM network at the cell level is necessary. This was done by selecting a 
nominal network load such that the blocking probabilities for CBR and VBR 
connections are close to 0.01, and mapping TCP connections on either a UBR 
service with shaping at 10 Mbit/s, or an ABR service. In the latter case, ABR 
control is achieved using the ERICA (Explicit Rate Indication for Congestion 
Avoidance) algorithm [10] with target utilization 0.98. 

Cell-level simulations were run for 5 different configurations generated by 
the call-level simulations; these configurations were deemed to be the worst 
cases observed during several hours of simulated network activity. This analy- 
sis was restricted to the case 70 = 2 that is more critical than 70 = 4. Results 
obtained in the 5 different scenarios are then averaged. Cell-level results are 
reported in Table 1. The eight columns with numerical values refer to ABR 
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Figure 6 Average load on bottleneck link vs. normalized offered load, for 
several values of 70; B = 10 




Load 



Figure 7 Average load on bottleneck link vs. normalized offered load, for 
several values of 70; B = 100 



and UBR, with either B = 10, or B = 100, and with Class 2 and Class U 
buffers whose capacity is either 1024 or 2048 cells. 

The first three rows provide throughput (S) results in Mbit/s for CBR and 
VBR connections, while for TCP connections the goodput (Sq) is shown. 
Goodput is defined as the useful throughput: goodput values are obtained by 
discarding all corrupted and duplicated segments, that is, considering only 
those segments that are useful to reconstruct the end user information. The 
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Table 1 Results obtained with CLASS for the 5 congested configurations 
generated by ANCLES with 70 = 2 



throughput of CBR and VBR connections depends only marginally on the 
service class used for TCP connections, thanks to the use of separate buffers 
and fixed priorities. The goodput of TCP connections, instead, is drastically 
influenced by the service class choice: the ABR goodput is significantly larger 
than the UBR goodput. 

The fourth and fifth rows provide results in Mbit/s for the amount of band- 
width wasted by TCP connections ( 3 ^) (the resources used by the network 
to deliver information useless for the TCP receiver) and for the portion of 
bandwidth that goes unused on the bottleneck link ( 3 ^). The results show 
that the bandwidth wasted by TCP connections is quite small in the case of 
ABR, but it is about 2/3 of the TCP goodput in the case of UBR with smaller 
buffers, and about 1/2 with larger buffers. 

The following three rows report loss probabilities for the three types of 
connections. The fact that loss probabilities are zero for CBR connections, 
in spite of the very small available buffers, is obvious, thanks to the priority 
that is given to this class of traffic. Loss probabilities are also quite small for 
VBR connections and decrease with buffer increase; this is due to the fact 
that buffers are larger, and that this traffic has priority over TCP traffic. Loss 
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probabilities for TCP traffic are instead quite high, specially when the UBR 
service class is used. 

The last rows report results for the average of the variable part of the cell 
delay (fid) (that is, the difference between the actual cell delay and propaga- 
tion and processing delays), and the standard deviation of the same quantity 
{(Td). All results are in //s. Averages and standard deviations of delay varia- 
tions are quite small for CBR connection. This is quite a positive result, that, 
together with the null CLR values, indicates that the QoS requirements for 
CBR connections are satisfied. 

Consider now VBR connections; values of the average and standard devi- 
ation of delay variations are significantly larger, but this type of traffic can 
tolerate such performance. The fact that the values of the average and stan- 
dard deviation of delay variations with 5 = 100 are much smaller than with 
R = 10 is due to the reduction of the CBR traffic from 79 to 66 Mbit/s and to 
a stable VBR traffic load; this results in shorter queues at the Class 2 buffer 
within the switch, hence in shorter delays. Indeed, the increase in TCP traffic 
that can be obseved when B grows from 10 to 100 does not impact the VBR 
traffic performance thanks to the priority service discipline. 

Coming finally to TCP connections, we see that the values of the averages 
of delay variations are about four times larger with UBR than with ABR. 
Standard deviations are instead similar with the two service classes. 

In summary, the numerical results provided by cell-level simulations indicate 
that the QoS requirements of CBR and VBR connections can be satisfied with 
the proposed CAC, at least if the buffer for Class 2 traffic is large enough, and 
the value of 70 is chosen carefully, a task that might not be easy a priori. As far 
as TCP connections are concerned, a great difference is observed between the 
use of the UBR and the ABR service classes. With ABR, the obtained QoS is 
quite good, and the utilization of the network resources is very satisfactory. 
Instead, with the UBR service class, the obtained QoS is significantly worse, 
but, even more important, the exploitation of the available network resources 
shows dramatic inefficiencies. 



5.1 CAC and Cell Loss Ratio Guarantees 

As we observed in Section 2.1, the CAC algorithm that we considered can be 
reduced to the equivalent bandwidth CAC algorithm proposed by Lindberger, 
under the assumption of ON/OFF sources, exploiting expressions (16) and 
(17) for the choice of the values of a and 70. 

Lindberger’s equivalent bandwidth CAC algorithm defines its parameters 
so as to be able to guarantee a specified cell loss ratio under rather strin- 
gent conditions, that mainly consist in the assumptions of a superposition 
of an infinite number of sources generating fixed-length and fixed-rate bursts 
according to a Poisson process, and the presence of a bufferless queue. 
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Since the conditions under which CAC algorithms normally operate are 
rather different from those assumed in Lindberger’s analysis, it can be inter- 
esting to verify under which conditions the cell loss ratio nominally offered by 
the CAC is actually achieved. 

For this reason we defined the parameters of our CAC algorithms so as to 
nominally achieve a cell loss ratio equal to 10“^, obtaining a = 0.926 and 
jQ = 1.6, and we simulated the ATM network topology of Fig. 1 at both the 
call and cell levels. 

The simulation at the call level, as usual, provided the configurations (5, as 
in the provious Section) to be later investigated at the cell level in order to 
obtain the cell loss ratio results. 

The cell-level simulations were run assuming that ON/OFF VBR sources 
always experience exponentially distributed OFF periods, whereas the distri- 
butions of the ON periods were set to be either constant, or exponential, or 
Pareto, with average equal to either 1, or 10, or 100 ms in the three cases. 
The parameter a of the Pareto distribution was set to 1.5. 

Results are reported in Fig. 8 as curves of the cell loss ratio versus the 
buffer size. Simulations have been run for buffer dimension up to 4096 cells: 
missing points mean that no cell loss was recorded in an overall simulated 
time of 1.5 • 10® slots (roughly 420 s). 

We can immediately observe that achieving cell loss ratios smaller than 
10“^ is quite difficult, and requires rather large buffers, specially in the cases 
of long ON periods, that translate in higher correlations in the cell streams. 
On the other hand the distribution of the ON periods does not seem to have 
as big an influence, even if non-constant distributions yield a slightly higher 
CLR. 

It could be argued that these results are driven by the fact that our sce- 
nario, characterized by a parking lot configuration and by a mix of CBR and 
VBR sources, is quite different from the bottleneck configuration with only 
homogeneous VBR sources used for the computation of the equivalent band- 
width in [5]. In order to investigate this possibility we have simulated a single 
bottleneck, indeed a multiplexer, collecting i.i.d. connections with the same 
characteristics of the VBR connections used in previous scenarios. Setting the 
target CLR to Eq. (12) yields a number of admitted connections equal 

to 537. 

Fig. 9 reports the results obtained in the above scenario when the ON 
periods are constant, since these are the validity conditions for Eqs. (12)- 
(15). The first comment on these results is the striking similarity between 
the curves in Fig. 9 and those in the upper plot of Fig. 8. Indeed we observe 
that in all cases the influence of the ON time average duration is dominant 
on the CLR: Eqs. (12)-(15) are derived without the explicit knowledge of the 
ON time duration, so that this behavior can not be caught by this equivalent 
bandwidth approach. A more detailed exam of the two mentioned plots shows 
that when connections are multiplexed several times within the network, as in 
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the parking lot case, the CLR of connections with long ON times is somewhat 
reduced, while the CLR of connections with very short ON times (1 ms) 
worsens by nearly an order of magnitude: this observation clearly shows that 
multiplexing is not always beneficial from the CLR point of view. 

An additional remark on equivalent bandwidth CAC algorithms can be 
drawn from Figs. 9 and 8: neither the examined network topologies, nor the 
ON periods distribution significantly influence the performance of equivalent 
bandwith CAC algorithms; however the average ON time of connections has 
a dramatic impact on CLR performance and must be taken into account in 
CAC algorithms for VBR sources. 



Constant ON Time- bottleneck configuration 




Figure 9 Cell loss ratio versus buffer size for VBR connections in a multi- 
plexer when the CAC parameters are set so as to achieve CLR= 10“^ 

The results discussed in this Section indicate that equivalent bandwidth 
CAC algorithms supporting cell-level performance guarantees should be em- 
ployed only in those network operating conditions for which they were derived; 
otherwise, the actual network performance may be remarkably different than 
expected. 



6 CONCLUSIONS 

We described and evaluated by simulation a simple CAC algorithm for ATM 
networks supporting different QoS classes. 

The CAC algorithm proved to be capable of satisfying the QoS requirements 
of CBR and VBR connections; moreover, if the ATM network implements 
the ABR service class, the CAC algorithm can also guarantee quite a good 
performance to TCP connections, as well as a very satisfactory exploitation 
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of the network resources. A similarly good behavior cannot be achieved with 
the UBR service class. 

The CAC algorithm was shown to be reducible to the equivalent bandwidth 
CAC algorithm proposed by Lindberger, in the case in which only ON/OFF 
VBR sources are present in the network. For this latter case some results were 
obtained showing the scope of validity of the above equivalent bandwidth CAC 
algorithm. 
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Abstract 

This paper exploits classical control theory to design congestion control 
algorithms for "best effort" traffic in ATM networks. The control goal is the 
full utilisation of network links without incurring cell losses. A fluid model 
approximation of cell flows is assumed and linear differential equations are 
used to model the dynamics of network queues in response to ABR traffic 
and quality constrained (CBR+VBR) traffic. The available ABR bandwidth 
is modelled by means of an unknown and bounded disturbance function. A 
general end to end control algorithm that feeds back to the source the space 
that is free in the network buffers is included in the model and a particular 
one is proposed which is based on Smith's principle. The linearity and the 
simplicity of the proposed control law allow us to prove, via mathematical 
analysis, that the algorithm always guarantees no cell losses whereas full 
utilisation of network links is ensured if the capacity ofper-VC buffers is at 
least equal to the VC bandwidth-delay product. Moreover, per-VC queuing 
easily allows switches to enforce fairness. Finally, it is shown how 
performance evaluation can be easily carried out using SIMULINK for 
MATLAB, which is a software tool widely used by control engineers to 
simulate dynamic systems. In this way, the effort to develop discrete event 
simulations is saved. 



I. Introduction 

In recent years, intense research efforts have been focused on the issue of 
transmitting multimedia traffic over a fully integrated universal network. To 
this purpose, Broadband Integrated Service Digital Networks (B-ISDNs) 




have been introduced and the emerging Asynchronous Transfer Mode 
(ATM) technology has been retained the transfer mode to be used in B- 
ISDNs (Varaiya and Walrand, 1996). ATM networks seek to provide the 
end-to-end transfer of fixed size cells and with specified quality of service. 
The fixed size of the cells reduces the variance of transmission delay making 
the networks suitable for integrated traffic consisting of voice, video, and 
data (ATM Forum, 1996; Jain, 1996). 

An increasing amount of research has been devoted to different control 
issues. These research efforts concern with ensuring that users get their 
desired quality of service. The ATM Forum Traffic Management Group 
defines five service classes to support multimedia traffic: 1) the Constant Bit 
Rate (CBR) class, which is conceived for applications such as telephone, 
video conferencing, and television; 2) the Variable Bit Rate (VBR) class 
which allows users to send at a variable rate. This category is subdivided 
into two categories: Real-Time VBR (RT-VBR), and Non-Real-Time VBR 
(NRT-VBR). An example of RT-VBR is interactive compressed video or 
industrial control (you would like a command sent to a robot arm to reach it 
before the arm crashes into something), while that of NRT-VBR is 
multimedia email; 3) the Unspecified Bit Rate (UBR) class which is 
designed for those data applications, such as email, file transfer, etc., that 
want to use any left-over capacity and are not sensitive to cell loss or delay; 
4) the Available Bit Rate (ABR) class which is designed for normal data 
traffic such as file transfer and email. This class does not require cell transfer 
delay to be guaranteed. However, the source is required to control its rate in 
order to take into account the congestion status of the network. In this way 
the Cell Loss Ratio (=Lost Cells/Transmitted Cells) is minimised, and 
retransmissions are reduced improving network utilisation. It should be 
pointed out that UBR does not require service guarantee: the drawback is 
that cell losses may result in retransmissions, which further increase 
congestion. The ABR service was defined to overcome this problem. It is the 
only class that responds to network congestion by means of a feedback 
control mechanism (ATM Forum, 1996; Jain, 1996). 

Congestion control is critical in both ATM and non- ATM networks and it 
is the most essential aspect of traffic management (Jacobson, 1988). A key 
issue is the “efficient coexistence” of quality-constrained services 
(CBR+VBR) and ’’best effort” services (ABR+UBR). Many efforts have 
been devoted to design control algorithms for ABR input rates. Many of the 
proposed algorithms lack of a complete theoretical foundation because they 
are derived using heuristic approaches. Nowadays, the interest for a theoretic 
approach based on control theory is ever increasing (Benmohamed and 
Meerkov, 1993). In fact, due to propagation delay, most algorithms exhibit 
persistent oscillations and can even be unstable. In (Benmohamed and 
Meerkov, 1993; 1994) an analytic method for the design of congestion 
controllers, which ensure good dynamic performance along with fairness in 
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bandwidth allocation, has been proposed. However the method requires a 
complex on-line tuning of control parameters in order to ensure stability and 
damping of oscillations under different network condition. Moreover, it is 
difficult to prove the global stability of the scheme due to the complexity of 
the control strategy. A dual proportional-derivative (PD) controller has been 
proposed in (Kolarov and Ramamurthy, 1997) to simplify the 
implementation of this algorithm. In (Rohrs and Berry, 1997), a 
proportional-integral (PI) controller has been proposed, and in (Fulton and 
Li, 1997) an explicit rate algorithm (EPRCA) is illustrated which estimates 
the background traffic (CBR+VBR) and the effective number of active 
sources. 

Algorithms that cannot be completely evaluated via mathematical 
analysis need to be tested by means of computer simulations in order to 
investigate their properties such as stability, fairness and full link utilisation. 
This validation is partial because is restricted only to the simulated 
scenarios. Moreover, since ATM networks belong to the class of Discrete 
Event Systems, simulations always require considerable development effort 
by part of programmers. 

In this paper, a fluid model approximation of cell flows is assumed and a 
classical control approach is used to model the dynamics of network queues 
in response to input traffic. The proposed model gives a general framework 
for modelling, designing and performance evaluation of linear feedback 
control algorithms. Moreover, Smith’s principle is chosen to design an 
efficient algorithm for throttling ABR input rates in high speed ATM 
networks. The feedback scheme uses circulating Resource Management 
(RM) cells in the ratio 1/NRM with data cells, while the intermediate nodes 
along the VC path stamp the space that is free in the buffers in the RM cells 
(ATM Forum, 1996). Stability and full link utilisation are shown via 
mathematical analysis even in the presence of large propagation delay. With 
respect to other congestion control algorithms, such as, for instance, the one 
reported in (Charny, Clark and Jain, 1995), our control law does not require 
the measurement of available bandwidth, a hard task in presence of bursty 
traffic. Unlike the algorithm proposed in (Izmailov, 1995), where links with 
constant ABR available bandwidth have been assumed, in this work the 
“best-effort” ABR bandwidth is modelled by means of a time-varying, 
unknown and bounded disturbance function. In (Zhao, Li and Sigarto, 1997) 
an algorithm based on H 2 optimal control theory has been proposed, where 
the ABR rate is only adapted to the low-frequency variation of the 
underlying VBR traffic. With this approach, stability analysis for the 
controlled network becomes possible assuming the knowledge of VBR 
traffic characteristic. 

The paper is structured as follows: In Sec. II the model of the system is 
described; in Sec. Ill the general framework to design closed loop congestion 
control algorithms is presented; in Sec. IV Smith’s principle is proposed to 
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design the control law; in Sec. V performance evaluation is carried out via 
mathematical analysis; in Sec. VI the discrete time form of the control 
equation is derived; in Sec. VII computer simulations using SIMULINK for 
MATLAB (Simulink, 1992) are developed and, finally, the conclusions are 
outlined. 



n. The Model 

2.1 The network and trajfic model 

The communication network can be considered as a graph (Fig. 1) 
consisting of: 

a) A set N-{ l,..n} of nodes (the switches); 

b) A set L={ 1,...^} of communication links, each one characterised by the 
transmission capacity c,= 1/f, (cells/sec) and the propagation delay tdi. 




Each node is characterised by the processing capacity \/tpn (cell/sec) 
where tpn is the time the switch i needs to take a packet from the input and 
place it on the output queue. It is assumed that the processing capacity of 
each node is larger than the total transmission capacity of its incoming links 
so that congestion is caused by transmission capacity only. The network 
traffic is contributed by source/destination pairs {S,D)eNxN. To each {S,D) 
connection is associated a Virtual Circuit (VC) mapped on the path p{S,D). 
Each switch output link maintains a per-VC first in first out (FIFO) queue. A 
deterministic fluid model approximation of cell flow is assumed, that is, each 
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ABR input rate is described by the function of the time u{i) measured in 
cells/sec. An ABR source is expected to declare only its peak cell rate, i.e. its 
maximum transmission speed c.v= lAy. It is worth noting that, in high-speed 
wide area network, a key parameter is the bandwidth delay product A/, 
which represents a large number of cells “in flight” on the transmission link. 

Remark 1: Per- VC queuing separates cells according to the flow to which 
they belong. This allows the VCs to be completely uncoupled, that is, each 
per-VC buffer stores only cells belonging to that VC. The fact that each flow 
has exclusive buffers allows an algorithm entirely implemented at the source 
to effectively and easily control congestion. Moreover, the round-robin 
service can easily enforce fairness (Peterson, L. L. and Davie B. S., 1996). 



2.2 The Feedback Control Scheme 

The closed loop control scheme proposed by the ATM Forum is assumed 
(ATM Forum, 1996). In this scheme an ABR source sends one control cell 
(RM cell) every NRM data cells. Each switch encountered by the RM cell 
along the VC path stamps in the RM cell the space that is free in the buffer 
associated to the VC only if this value is smaller than the one already stored. 
At the destination, the RM cell carries the minimum available space over all 
the encountered buffers and it comes back to the source conveying this 
value. Upon receiving of this information, the source updates its input rate 
(Fig. 2). 




RM cell: [] Data cell: | Congested queue: xuit) 



Fig. 2: End to end feedback control scheme using RM cells. 
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Fig. 3: Two VCs (SI, Dl) and (S2, D2) sharing links l\ and k are shown. 
Per- VC queuing is maintained at output links. 



2.3 The Queue Model 

Two ABR connections, which share links l\ and I 2 and maintain per-VC 
queuing, are shown in Fig. 3. Let Xij{t) be the level of the queue associated 
with virtual circuit VC, and link Ij. By writing the flow conservation 
equation, the queue level at time t, starting at t=0 with Xij(0)=0, is 



^ij (0 “ \l^ij (T ) d j (T)](iT (1) 



where, My(0 is the inflow rate due to the VC, connection, is the 
propagation delay from the VC, source to the Xij queue, and dj(t) is the rate of 
packets leaving the queue, that is, the ABR available bandwidth. 

Notice that an output link is shared by ABR, VBR and CBR traffic. 
Therefore, the buffer depletion rate 4(0 depends on network traffic loading 
the link. Since it is difficult to measure the available ABR bandwidth, 4(0 is 
modelled as an unmeasured disturbance. 

Fig. 4 shows the block diagram of Eq. (1). Following control systems 
terminology, the system in Fig. 4 is called the plant. The ABR input rate 
Uij(t) is the control variable, i.e. the variable that can be throttled. The ABR 
bandwidth 4(0 is the disturbance, which is assumed to be not measurable. 
The queue level jc,y(0 is the controlled or output variable, i.e. the dynamics 
that must be reduced to the desired one by means of uift) in the presence of 
the disturbance 4(0* 
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Fig. 4: Block diagram of the plant: Wy(0 is the control variable, dj{t) is the 
disturbance, Xij(t) is the output variable. 



III. General framework for designing a linear controller 
FOR ABR TRAFFIC 

In this section, a general framework to design closed loop control 
algorithms for ABR traffic is proposed. The proposed scheme consists of a 
closed-loop mechanism feeding back to the ABR source the available buffer 
space encountered by RM cells along the VC path. The control goals are: 1) 
stability of network queues; 2) high utilisation of network links; 3) max-min 
fairness (Jaffe, 1981; Jain, 1996). 



3.1 A linear controller 

The idea of using ABR best-effort traffic to fully utilise bandwidth in 
ATM networks led to the introduction of closed-loop congestion control 
algorithms. Binary feedback schemes were first proposed (Jain, 1996; 
Benmohamed and Meerkov, 1993; Iliadis, 1995; Bonomi, Mitra and Seery, 
1995; Yin and Hluchyj, 1994). In these schemes, if the queue level of a 
switch is greater than a threshold, a binary digit is set in the RM cell. A 
consequence is that the controlled system is nonlinear even if, in this case, 
the plant is linear with transfer function: exp{-sT)/ s. Moreover, due to the 
binary feedback conveyed by the RM cells, problems of stability and 
performance arise. 

Following these considerations and noting that RM cells have enough 
room to store and convey the available buffer space as feedback information, 
a linear feedback control is designed so that the controlled system (Fig. 5) 
keeps linear. Linear systems have many appealing properties. Moreover, a 
complete and well-established control theory exists for them. 
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Fig. 5: Block diagram of the controlled bottleneck queue level dynamics x(t) 
in response to the ABR input rate u(t) and to the available bandwidth 
d(t). 



3.2 Model of the controlled ABR flows 

Considering a Virtual Circuit, it has a bottleneck link along its path. The 
buffer feeding this link is the only one with a queue level greater than zero 
over all the buffers along the path. Thus, the congestion avoidance algorithm 
must guarantee that: 1) the bottleneck queue does not overflow (i.e. no cell 
loss); 2) the bottleneck queue level is always greater than zero (i.e. full link 
utilisation). 

The block diagram of the closed loop controlled bottleneck queue 
dynamics in response to (ABR+VBR+CBR) traffic is shown in Fig. 5. In 
particular, it consists of: 

1) The connection bottleneck queue x{tf, which is modelled in the Laplace 
domain by the integrator lA; 

2) The disturbance d{t\ which models the ABR bandwidth that is available 
for the considered VC; d{t) is an unknown and bounded function that 
represents the bandwidth left available by coexisting (VBR+CBR+ABR) 
traffic; 

3) The transfer function e ^ , which models the propagation time Tf^ 
from the source to the bottleneck queue; 

4) The transfer function e , which models the propagation time Tft 
from the bottleneck queue to the destination and then back to the source; 

5) The controller transfer function Gfs); 

6) The VC input rate u{t)\ 



1 



From now on variable subscripts ij are omitted to simplify the notation. 
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7) The reference signal r(t), which sets a threshold for the bottleneck queue 
level. 

The feedback control scheme works as follows: the source receives the 
queue level and then inputs the difference (r(t)-x{t-Tfb)) into the 

controller G^s) whose output is the ABR input rate u(t). The input rate 
reaches the bottleneck queue after the forward propagation delay Tf^ 
whereas the buffer free space is fed back to the source after the backward 
propagation delay Tfly. We assume that the RM cells have priority over data 
cells at the queues. As a consequence, the round trip time of the RM cells is 
constant and equal to the round trip propagation time. A major advantage of 
this assumption is that random queuing delays are zero and, therefore, the 
round trip time inside the control loop is reduced to the minimum and is 
constant. 

Remark 2: The round trip time {RTT) of the VC connection is always 
wherever the connection queue bottleneck may be positioned along 
the VC path. Therefore, the proposed scheme also models the realistic case 
of moving bottleneck. 



IV. The Control Law 

Fig. 5 shows the general scheme of the proposed linear closed loop 
control law. The input rate seeks to fill the bottleneck queue, whereas the 
available bandwidth seeks to empty this queue. To guarantee full utilisation 
of network links in the presence of the disturbance d{t), the control variable 
u{t) must pump enough traffic so that the queue is never empty. On the other 
hand, to guarantee stability, i.e. to avoid congestion, the control u must 
contain the intensity of data pumping. 

Formally, queue stability can be stated as the objective of designing a 
control law u{t) for each ABR input rate, such that the bottleneck queue level 
x{t) satisfies the following stability condition: 

x{t)<r^ forr>0 

where r" is the capacity of the buffer. Similarly, full utilisation of network 
links can be guaranteed if the bottleneck queue level satisfies the following 
efficiency condition: 

x(t) > 0 and for t>T 
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where T takes into account the round trip propagation time inside the 
control loop. Clearly, this condition guarantees that any link has always data 
to send. 

Now, setting the reference value r{f) to the bottleneck buffer capacity 
the control action seeks to fill this buffer at its full capacity, whereas the 
disturbance d{t) tries to empty the queue. Due to the possibly large 
propagation delays in the control loop, queue level dynamics might exhibit 
oscillations, and even become unstable. Therefore the design of the linear 
controller Gc{s), which satisfies the stability and efficiency conditions, must 
be carried out carefully. 

We propose to design the controller Gds) following the Smith principle, 
which is an important classical control technique for time-delay systems 
(Smith, 1959; Marshall, 1979; Franklin, Powell and Emami-Naeini, 1994). 




Fig. 6: The linear controller Gds) and the time-delay system G(s)exp(-sT) 



^ J\ 


K(s) 




G(s) 


1 ^ 








w 


r 


^1 


e 


t 

















Fig.7: Desired input-output dynamics 



4. 1 The Smith principle 

Smith's principle is well-known as an effective dead-time compensator 
for a stable process with large time delay. Consider the goal of designing a 
controller Gds) for the time delay system G{s)exp{-sT) shown in Fig. 6. 
Smith’s principle pursues the goal of designing a controller Gds) such that 
the resulting closed-loop dynamics is delay-free. More precisely, Gds) is 
chosen so that the system becomes equivalent to the reference system 
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reported in Fig. 7. This system consists of: 1) the delay-free “plant” G(s); 2) 
the controller ^( 5 ); 3) the delay exp{-sT) which is out of the feedback loop. 

The chosen reference system is appealing because it is a delay-free 
system whose output is delayed by the time T. 

By equating the transfer functions of the systems in Fig. 6 and Fig. 7 

K{s)G{s) ^-sT Gcis)G{s)e-^'^ 
l + K(s)G^s) l + Gc(s)G(,s)e~^'^ 

the required controller Gc(i) results 



Gc(s) = 



K(s) 

l + K(s)G(s){l-e~^'^) 



( 2 ) 



The block diagram of the controller is reported in Fig. 8. It should be 
noted that, using the Smith principle, the problem of designing the controller 
for the time-delay system in Fig. 6 has reduced to the design of the controller 
^(5') for the delay-free system in Fig. 7 (Marshall, 1979). Notice that an 
accurate model of the plant G{s)exp{—sT) is necessary because this is part of 
the controller G^. 



x(t-Tfb) 




Fig. 8: Block diagram of the controller Gds) 
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Fig. 9: Desired equivalent system 



4.2 The proposed control lawforABR traffic 

In wide area networks, the round trip time (RTT) of the VC connection 
strongly influences the stability of the control algorithm. RTT is mostly 
determined by the propagation delay. Nowadays, ATM vendors are building 
switches that give priority to RM cells. This assumption reduces the queuing 
time to zero and makes the round trip time constant and equal only to the 
round trip propagation time. This quantity is known in advance, when the 
connection is established^. Therefore, since the model of the plant is 
accurate, Smith’s principle can be applied. In particular, the system reported 
in Fig. 5, where RTT^Tf^+Tfi,, can be made equivalent to the delay-free 
system shown in Fig. 9. In fact, equating the transfer functions in Figs. 5 and 
9 



TOC(») Cc(i)G(sy 

1 + s: WG(s) ‘ ^ 

the following controller is derived 

m — ^ 



( 3 ) 



Now, choosing a simple proportional controller K(s)=k, where k is a. 
constant gain, and considering that G(s)=l/s, the transfer function of the 
equivalent system shown in Fig. 9 becomes 

/: + 5 



^ To take into account the jitter of round trip time due to queuing time, a model containing 
time varying delays could be considered. However, this would make hard to deal with the 
system using rigorous and simple mathematical analysis. 
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This is a first order system with the delay Tj^ in cascade. From basic 
control theory it is known that the step response of a first order system 
delayed by Tj^ is /\l-exp(t-Tf^)]‘l(t-Tf^), where is the magnitude of the 
step This response has two advantages: 1) it is bounded by 2) 

it has no dumped oscillations. 

The controller Gc(s) is 








( 4 ) 



which, in the time domain, gives the following input rate control equation 



r 



u(t) = k\ 



r" u{t)-dx 



( 5 ) 



This is equation is very attractive due to its simplicity. It can be 
intuitively interpreted as follows: the computed input rate is proportional, 
through the coefficient k, to the available space in the bottleneck buffer, that 
is decreased by the number of cells released by the VC during 

the last round trip time RTT = = T ^ 

We will show that the proposed control scheme guarantees stability of 
network queues, full utilisation of network links and max-min fairness. 



V. Performance Evaluation 

Linear control theory provides an established set of tools which enables 
us to design algorithms whose performance can be predicted analytically 
rather than relying on simulations. In particular, to analyse the performance 
of the proposed algorithm is sufficient to use standard Laplace transform 
technique. The advantage of mathematical analysis is that it enables 
properties to be demonstrated in a general setting, whereas the validation 
using simulations is always restricted to the simulated scenarios. Notice that, 
in our case, mathematical analysis is possible due to the simplicity of the 
proposed algorithm and to the assumption of per-VC queuing, which 
uncouples the flows. 



5 . 1 The disturbance input 



^ Each VCi is characterized by proper round trip time 7, and input rate w,(0. The subscript i is 
omitted to simplify the notation. 
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A link transmission capacity normalised to unity is assumed so that, if all 
link bandwidth is available for the considered ABR flow, then d{t) is equal 
to the step function 1(0- The coexistence of different ABR sources with 
VBR and CBR traffic reduces the bottleneck bandwidth that is available for 
a single VC to d{t)=\{t)-b{t)>Q, where b{t) is the bandwidth used by the 
coexisting traffic. By defining = min \b(t)}, it results 

t 

d{t)< 1 {t)-bm' 1 {t)=a • 1 if) (6) 

where a={\-brr)<\. The step function a \{t) is a worst case disturbance 
that models a bandwidth that is suddenly available at r=0. 



5.2 Stability 

Now we show that the controller (4) ensures queue stability in the 
presence of the disturbance (6). 

Proposition 1: The output of the system reported in Fig. 5, where Gds) is 
the controller (4), r{t) is the step function f -\{t-Tf^) and d{t) is the step 
function a 1(0, satisfies the stability condition x{f)< f for r>0. 

Proof: 

The reference signal f • \{t-Tp) models the fact that the space that is free 
at the bottleneck buffer, that is {f-x{t)), reaches the source after the 
backward propagation delay 7}^ as {f • \{t-Tfj,) - jc(r-7}^,)). 

Using the controller (4), the input-output dynamics of the system in Fig. 
5 is equal to the input-output dynamics of the system in Fig. 9. Thus, 
considering the system in Fig. 9, where K{s)=^k, the Laplace transform of the 
output in response to the reference signal r^ • l(r-7}^) is 






-sT 



{l + s/k) 



By transforming back to time domain, it follows: 
x,{t) = r‘’ il - )• l(f -r) 
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Moreover, using the controller (4), the transfer function from the 
available bandwidth d{t) to the queue level Xdif) is given by (see Fig. 5) 

^ I ^ -Ts 

D(s) s s(s + k) 



which, for D(s) = — , gives: 



Xd (t) = a 






<0 



Thus, it follows that 

x(t) = (t) + Xj (0 <x,(t) = r° (l - )• l(f - r) < r'’ for t>0. 

This completes the proof. 



Remark 3: The queue dynamics is characterised by the time constant 
T = llk. Therefore, the transient dynamics can be considered exhausted 
after the time 7;^ = 7 + 4r . 



5.3 Full utilisation of network links and max-min fairness 

The efficiency condition must be satisfied to guarantee full utilisation of 
network links in the presence of time varying ABR bandwidth. 



Proposition 2: The controller (4) guarantees the full utilisation of 
network links in the presence of the worst case disturbance a \{t) if the per- 
VC capacity of network queues satisfies the following condition 

r°>a{T + T) (7) 

Proof: 

Considering the queue dynamics in response to the step input r^'\{t-Tjf) 
and to the disturbance a \{f), it results 



323 




x{t) = Xrit) + x^it) 



with 



r 



Xf(t) = r^i^-e ^^) l(t-r) and 
Xd(t) = a 

For r » 7 + 4t , it results: 



- 1 ■ l(r) + (r - r)- l(r - r)- - (l - )• i(r - 7) 

Jc 



x{t) = x^ =r^ - a T 

k 



( 8 ) 



In order to fully utilise the bandwidth, Xs must be greater than zero so that 
the queue has always data to send. Thus, condition (7) is derived. 



Proposition 3: The buffer capacity r"’=a (r+T) ensures that, in steady state 
condition, the ABR flow captures all available bandwidth d{t)=a-\{t). 

Proof: 

Let Xs and Us be the steady state values of x{t) and u{t). From Equation (5) 
it follows 



-Xs -UsT) = u, 

which gives 



Substituting (8) in (9) it results 
= a 

that is, all ABR bandwidth is captured by the considered connection. 



Proposition 4: The round-robin service of per-VC queues guarantees the 
fair allocation of available bandwidth to each VC. 
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Proof: 



The round-robin service discipline allocates the available bandwidth to 
each VC flow in accordance with a given definition of fairness such as, for 
instance, max-min fairness. In this way, the queuing service discipline, 
which is executed at the switch, allocates the fraction a, of the available 
bandwidth to the j-th VC. This bandwidth a, represents the disturbance for 
the j-th controlled VC. Finally, notice that the queuing service discipline is 
not “seen” by the congestion control algorithm, which is executed at the 
source, that is, the issues of congestion control and fairness are completely 
uncoupled. 



VI. Discrete Time Control Equation 

By assuming a fluid model approximation of cell flows, a continuos time 
model of traffic and queue dynamics in ATM networks has been derived 
(Fig. 5). However, the feedback information is relayed in RM cells, and thus 
not available in continuos time, but rather in sampled form. Therefore the 
control equation (5) has to be transformed to discrete time form. This means 
that the controller updates the input rate every Ts units of time, where T, is 
the sampling time. From Shannon sampling theorem and digital control 
theory (Astrom and Wittenmark, 1984), it is known that, in order to have a 
"continuous like" performance of the system under digitised control, the 
ratio of the time constant of the system over the sampling time must fall 
within the interval (2,4), i.e. 



t/7,e(2,4) 

To write the discrete time version of the control equation (5), two cases 
are considered (Mascolo, Cavendish and Gerla, 1996): 

1) RTT>Ts 

It can be introduced the integer m and the real e e [0,1) so that 

RTT/Ts=m-\-e. The discrete time control equation, at tk=kTs, gives the input 
rate 



u{tk) = k r° -{m + \)Ts) eT^ - 

i=l 



2) RJT<Ts 
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= -x{t^ -T^) RTt\ 

Finally, we remark that the discrete time form of the control equation is 
simpler to be realised than the continues one (Astrom and Wittenmark, 
1984). 



vn. Simulation results 

Computer simulations are carried out to investigate the dynamic 
behaviour of an ABR input rate sharing a bottleneck link with 
(ABR+VBR+CBR) traffic. The available ABR bandwidth, normalised to 
unity, is modelled by means of an unknown and bounded disturbance 
function d(t). A constant gain k equal to 1/50 and a buffer capacity equal 
to 50 are assumed in equation (5). 

Fig. 10 shows the discrete time model of the system as it appears in the 
SIMULINK block diagram window. SIMULINK for MATLAB is a tool 
widely used by control engineers for simulating dynamic systems (Simulink, 
1992). The use of this package is easy and allows programmers to save the 
effort to develop discrete event simulations. 

Figs. 1 1-13 show the dynamics of d(t), x(t) and u(t). The ABR connection 
is characterised by the bandwidth-delay product of 20 cells that is typical of 
a LAN. Figs. 14-16 show the corresponding results for a connection 
characterised by a bandwidth-delay product of 80 cells, which is typical of a 
metro or regional WAN. 

Simulation results confirm theoretical analysis. The ABR steady state 
input rate Us captures all available bandwidth, and the steady state queue 
value Xs is in accordance with (8). 
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Fig. 10: Model of the controlled system as it appears in the SIMULINK 
block diagram window. 
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Fig. 16: Controlled ABR input rate u{t) 



Conclusion 

Classical linear control theory has been exploited to propose a general 
framework for modelling, designing and performance evaluation of 
congestion control algorithms for ATM networks. In particular, Smith’s 
principle has been proposed as a convenient technique for controlling ABR 
traffic in data networks characterised by a large bandwidth-delay product. 
The proposed algorithm has several advantages. In particular: 1) it is simple 
and effective in a realistic scenario where many connections, with different 
round trip times, share a bottleneck link that can move along the connection 
path; 2) it guarantees no cell loss and full utilisation of network links; 3) it 
considers the interaction of ABR with VBR traffic by means of an unknown 
and bounded disturbance function that represents the available bandwidth. 
This makes unnecessary for the switches to measure the available 
bandwidth, a difficult task in presence of a bursty traffic such as the VBR 
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traffic; 4) the controlled system dynamics is a first order one, that is, the 
output exponentially converges to steady state value. 

Performance evaluation has been also carried out by means of computer 
simulations using SIMULINK for MATLAB. The experimental results 
confirm the validity of the control theoretical model. Related and ongoing 
works are reported in (Cavendish, Gerla and Mascolo, 1995; Cavendish, 
Mascolo and Gerla, 1996; Mascolo and Gerla, 1997; Mascolo, Cavendish 
and Gerla, 1997; Mascolo, 1997). 
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Abstract 

Asynchronous Transfer Mode is the chosen transport mechanism for future 
broadband networks. Although there is a huge number of mathematical mod- 
els and experimental results, there are still many problems to be investigated. 
One of these problems is the way to provide guaranteed performance for real 
time traffic. In this paper we propose an algorithm for call admission control 
based on a simple and accurate estimate of the cell loss probability considering 
both cell scale and burst scale components. The loss probability is estimated 
from the asymptote of the tail probability by introducing a correction factor. 
Extensive numerical evaluations and simulations are made to evaluate the 
accuracy of the proposed algorithm. 
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ATM, Real-Time Traffic, Cell Loss Probability, Tail Estimation, Bandwidth 
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1 INTRODUCTION 

Due to its great flexibility. Asynchronous Transfer Mode (ATM) is widely con- 
sidered as the suitable technique to realise the full integration of transmission 
and switching for different kinds of services, such as voice, video, and data. 
These services are known to have different traffic characteristics (peak rate, 
mean rate, burst length, etc.), each having their own specific performance 
requirements (real-time, non-real- time, loss sensitive, etc.), which results into 




conflict Quality of Service (QoS) requirements (cell loss, transfer delay, cell 
delay variation, etc.). 

One of the main areas of ongoing research is the way to carry real-time 
traffic applications such as voice and video over ATM networks. In addition 
to the specific cell loss probability (CLP), real-time traffic requires a strict 
limit on the maximum end-to-end cell delay, beyond which arriving cells may 
be considered lost. Because of the this delay limit the size of the used buffer 
is usually very small. 

In ATM networks, cells from different connections interact with each other 
at each switch. Without proper control, these interactions may adversely af- 
fect the network performance. One of the most important issues in providing 
guaranteed performance services is the choice of the cell service discipline 
at ATM switches (Zhang 1995). First-In-First-Out (FIFO) service discipline 
is the most used queuing discipline. FIFO can provide only an average per- 
formance for the aggregate traffic. For guaranteed Quality of Service (QoS) 
traffic, performance should be on a per- connection basis. Therefore, we em- 
phasise using non-FIFO scheduling algorithms such as Weighted Round Robin 
(WRR) (Kang et al. 1995, Rampal et al. 1995). Along with scheduling. Call 
Admission Control (CAC) is required to provide guaranteed QoS. 

An important goal behind ATM is the efficient utilisation of the resources 
due the ability of ATM to provide statistical multiplexing. Because of the small 
buffer size used with real-time traffic applications, many authors assume no 
statistical multiplexing gain can be achieved and therefore, they propose very 
conservative CAC algorithms such as peak rate allocation (Mitrou et al. 1996), 
or bufferless model schemes (Hsu et al. 1996). In this paper we insist that a 
reasonable statistical gain can be achieved although the buffer size is small. 
This is done by introducing an efficient CAC which is based on an accurate 
CLP estimation. 

To develop an accurate estimate of the CLP of small finite buffer we used 
the idea of adjusting the tail distribution so as to provide a simple and ac- 
curate closed form formula for CLP. This is done through the development 
of a correction factor that makes the necessary adjustment. This method de- 
pends mainly on simple analytical formulation combined with simulation and 
numerical evaluations. 

The algorithm that is being accepted in the literature to provide efficient 
link utilisation is by making best-effort traffic utilises the unused bandwidth 
left over by the real-time traffic applications (Tsang et al. 1996). In this 
scheme, traffic is classified into different classes, some of them for real-time 
traffic applications and others for best-effort applications. 

In our belief, efficient utilisation of the link capacity means that : not only 
each slot is being utilised by the incoming traffic, but it means that : each 
slot is utilised by the cell that should be served. Because real-time traffic; 1) 
provides more revenue than data traffic, and, 2) is expected to comprise a 
large percentage of the network load in the future broadband networks, the 
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issue of efficient utilisation of resources assigned to real-time traffic becomes 
an important issue. This matter will be more crucial if the link is solely used 
to transport real-time traffic (i.e. no best effort traffic) (Tsang et al, 1996). 

Therefore, we argue that the above mentioned algorithm is not optimum 
from the efficiency point of view unless provided with efficient algorithms for 
CAC and bandwidth assignment. 



2 SYSTEM MODEL 

2.1 Source Modelling 

We model a single source using On-Off source model, where each source is 
modelled as a two-state Markov chain. On-Off source model can be considered 
as the output of the shaping device at the user interface, therefore it may 
represent an actual traffic characterisation. 

To describe the On-Off sources we assume the source switches from On 
state to Off state with probability 1 — Pn and switch from Off state to On 
state with probability 1 — Poo- Both On and Off duration are assumed to be 
independent and geometrically distributed. Therefore, the average On period 
(burst length) Ta is given by = 1/(1 - Pi i) and average Off period T, is 
given by T, = 1/(1 — Poo)- The source activity factor p is given by p = 
Ta/{Ta -\-Ts), so the burstiness /? is given by ^ = 1/p. During On period 
the sources generates information with peak rate Rp bps while during Off 
period no information is generated. Accordingly, the mean rate m is given by 
m = Rpp. The ratio of the service rate to the peak rate of the source is denoted 
by M. For numerical experiments we consider five classes of real-time traffic 
applications of different traffic characteristics. Table 1 provides the parameter 
values of these applications. 



2.2 Overall System Model 

Our simplified model for a statistical multiplexer of an ATM network node 
consists of output buffered switches with nonblocking switch fabrics. Switch 
output nodes are organised as parallel FIFO’s which share the output link’s 
capacity V via Weighted Round Robin (WRR) scheduling algorithm. Schedul- 
ing have the effect of providing access to share of bandwidth, as if each service 
class had its own server at its given rate. Therefore, we assume that traffic 
is classified into K different classes according to the QoS requirements and 
traffic characteristics such as peak rate, mean rate and mean burst length. 
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Apphcation 




Parameter 




Pecik rate (Mbps) 


Burstiness 


Burst length (cell) 


Voice 


0.064 


2.9 


58 


Videotel 


2 


2.5 


212 


MPEGl 


1.856 


4 


2570 


HDTV 


30 


4.6 


12264 


Image 


2 


23 


2604 



Table 1 Parameter values for different applications 



Each class will be assigned its own buffer and individual share of the total 
link capacity to support its QoS. There are K admission controllers, one as- 
sociated with each traffic class. A bandwidth assignment controller is linked 
to all admission controllers and the multiplexer. The bandwidth assignment 
controller, according to the network state, allocates a fraction (7,- (we call it 
service rate of class i) from the total link capacity to each class i such that 
it satisfies the required QoS. This capacity will be modified over time as con- 
nections are dynamically set up and torn down. In this way, the analysis of 
heterogeneous traffic is simplified into the case of homogeneous traffic. The 
CAC problem then reduces to the analysis of the single-class single queue 
with its specific service rate. 

In our analysis we assume that the delay constraint is provided by limiting 
the buffer size, therefore, only the cell loss probability will be the measure of 
performance, hence we assume that each class has a specific requirement cell 
loss probability denoted by e,*. For each class i, we assume that N{ independent 
and identically distributed (i.i.d.) On-Off sources share a buffer of finite size 
Bi cells so that the maximum queuing delay is of r,- seconds. The total load 
of each class is given by pi = NiPi/M{. 

According to the non- work conserving WRR scheduling algorithm, let the 
cycle length to be W cells, and each queue has a quota of g,- cells. Then the 
service rate of each queue is given by: 







( 1 ) 



In this paper we make the analysis for one class so we omit the subscript i. 
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3 TAIL ESTIMATION 



The tail of the queue length distribution is defined as the probability that 
the steady state content of the infinite queue exceeds certain value i.e., 
Ptaii = Q{P) = Pr(Q > where Q denotes the steady state buffer content. 
Generally it is difficult to evaluate the tail probability exactly, therefore, the 
tail distribution itself is usually approximated. To develop an approximate 
estimate of the tail we use the decomposition approach where we model the 
queuing system as the superposition of two separate components, namely the 
cell scale, Qceii{P)y and burst scale, Qburst{B)j components (Mignault et al. 
1996). Therefore, we can write the tail as: 



Ptaii = Qcell{B) H- Qburst(B). 



( 2 ) 



3.1 Burst Scale Component 

The burst scale component Qburst(B) is usually estimated using large buffer 
approximation, i.e. the burst scale component is approximated by the tail of 
an infinite queue. Because the evaluation of the tail is not simple, we develop a 
simple approximation to the tail for the burst scale component using discrete 
time queuing model. For discrete- time queuing systems, it has been observed 
that the steady state queue length distribution exhibits a geometrically dis- 
tributed tail (Bruneel et al. 1996, Ishizaki et al. 1995, Sohraby 1993). That 
is, for sufficiently large buffer size B, we have: 

Qbur,t{B)f^A-z;^. (3) 

Where Zo is called the dominant root and A is called the leading factor to 
be determined. The dominant root is relatively simple to evaluate. Sohraby 
(Sohraby 1993) gives several approximations for zq for different On-Off source 
models. 

(a) The Leading Factor 

The evaluation of the leading factor A is not as simple and needs some math- 
ematical analysis. Different proposals have been introduced in the literature, 
here we review some of them and then we propose an approximation for the 
leading factor A. 

• Effective Bandwidth Approximation 

In what is called effective bandwidth approximation, the leading factor is 
put equal to 1, i.e., 

Qburst(B) = . (4) 
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• Sohraby Approximation 

Based on the heavy traffic assumption, Sohraby simply put A equal to the 
total load p = Np/M, i.e., 

Qburst{^) — P^o 

• Ishizaki et al. Approximation 

In a try to decrease the conservatism of Sohraby ’s method, Ishizaki et al. 
(Ishizaki et al. 1995) proposed a heuristic approximation of the leading 
factor A as follows: 



A - 4I/2 .1/2 

^ — ^Soh ^Geo^ 



( 5 ) 



where Asoh is the leading factor of Sohraby’s proposal, and Aoeo is the geo- 
metric mean of the leading factors of the lower and upper bounds of the tail 
probability which are given in (Ishizaki et al. 1995) through considerably 
complex formulas. 

• Proposed Approximation 

We thought of the leading factor A as the value of the tail, Pr((5 > B), 
when B = 0, that is, A = Pr(Q > 0). The probability Pr((J > 0) can be 
related to the saturation probability that the input rate exceeds the service 
rate, i.e., the bufferless saturation probability Pzero, sat as follows (Artiges 
et al. 1996): 

Pzero, sat < Pr(Q > 0 ). ( 6 ) 



Hence, the probability Pzero, sat can be considered as an approximation 
of Pr((5 > 0). Accordingly, in the next section we review the bufferless 
approximation, where we develop an estimate of the leading factor A using 
Bahadur-Rao formula. 



3.2 Different Bufferless Approximations 

For the bufferless model the saturation probability Pr(A > C) is given by: 

N 

P zero, sat = Pr(X > (7) = E (7) 



where Pk is the binomial distribution given by: 

ft=( f (8) 
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The average cell loss probability Pzero, loss is given by: 



zero, loss 



1 

Jn 



N 



T. 

A=fMl 



Pk-{k-M). 



( 9 ) 



It is clear that Pzero, sat is easier to obtain than Pzero, loss- Equation(9) is 
usually called the Virtual Cell Loss Probability (VCLP). 

While equation (7) and (9) can be exactly evaluated, they become too dif- 
ficult for on-line calculations for large N\ therefore, they are usually approxi- 
mated. Several approximations have been used such as Gaussian approxima- 
tion and large deviation approximations. Gaussian approximation may be very 
inaccurate, it may overestimate or underestimate the saturation probability. 
An other well known approximation is to apply large deviations approxima- 
tions using Chernoff bound. In this case for the non-negative random variable 
X and positive constant C, we have for 0 positive: 

Pr(X >C)< (10) 

where E[x] is the expected value of x. 

There is an optimum choice of the parameter 0 which minimises (10) for 
given C. Using simple mathematical formulation the Chernoff bound for N 
i.i.d On-Off sources can be written as: 

/n\Na f \ — 

Pr(X>C)<(9 , (11) 

where, a = 



3.3 Bahadur Rao Bound 

Chernoff ’s bound as given by the upper bound (11) often overestimates the 
saturation probability Pr(X > C). A better refined approximation can be 
obtained using Bahadur- Rao theorem (Hsu et al. 1996). If 0 is given, then we 
can write the inequality in (11) with equality as follows: 

/p\Na /\ — 

Pr(X>C)=0 (^) .^(^), (12) 

where ^{0) is a correction factor. 

Bahadur and Rao used the idea to shift the most accurate point of the esti- 
mation from the region of the original mean value to the interesting region of 
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very small saturation probabilities. The shifted distribution can be approxi- 
mated accurately by Gaussian distribution around its own mean. Accordingly, 
the correction factor is given by: 



where ^ In “ “)• 

Accordingly, we have the following approximation for the saturation prob- 
ability using Bahadur- Rao theorem, PBR,sat' 



Per, sat 



1 px JVg / I “) 

V^VNcre^a) \l-aj 



(14) 



Bahadur-Rao approximation that is equivalent to the virtual cell loss proba- 
bility equation (9) is given by: 



p PBRjSat 

“ ONpRp ' 

Equations (14) and (15) are applicable to any system with number of inde- 
pendent sources. They are also accurate and relatively easy to calculate. 

Accordingly, we propose to approximate the leading factor A by the Bahadur- 
Rao loss probability. Per, loss , as given by equation (15). 



3.4 Cell Scale Component 

One proposal to model the cell scale component is to use M/D/l model. A 
more accurate model proposed is using N ♦ DjDlX queuing system (Fiche et 
al. 1994, Mignault et aL 1996). 

Although the exact solution of N * DfDfl queue is relatively straight- 
forward, it may be not simple for fast calculations. Based on heavy traffic 
assumption using Brownian Bridge approximation method an approximation 
is provided as given in (Pitts et al, 1996). This approximation underestimates 
the cell loss for low utilisation. In (Fiche et ai 1994) a much better approx- 
imation is derived which is good for all traffic intensities. For the cell scale 
component Qceii{B)j we use this approximation which is given by ; 

Qceii(B) ^^■exp(-B-{^ + l-p- ln{p))^ . (16) 
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4 ACCURACY OF THE PROPOSED ESTIMATE OF THE TAIL 



Now we will investigate the accuracy of the proposed tail by comparison with 
some exact and approximate tail distributions. Some results are shown in 
Figure 1 and Figure 2. For the exact tail calculations we present two different 
algorithms. The first is the algorithm of Anick, Mitra and Sohadni (Anick 
et al. 1982) which is based on fluid flow model, we denote it by AMS exact 
tail. The other algorithm is the one developed by (Choudhury et al. 1996) 
which is based on ^G/G/1 queuing model, we call this G/G/1 exact tail. 
For exact tail we do not make the numerical calculations, instead 

we reproduced the results directly from (Choudhury et al. 1996). 





Figure 1 Accuracy of the proposed Figure 2 Accuracy of the proposed 
tail estimation with small load and tail estimation with small load and 
small buffer size, p = 0.3, M — 18.2 small buffer size, p — 0.3, M = 7.3 



In Figure 1 and Figure 2 we compare, for small load and small buffer sizes, 
the proposed approximation of the tail with the exact tail of AMS and G/G/1. 
The results for G/G/1 are respectively reproduced from Figure 7 and Figure 
1 in (Choudhury et al. 1996). Also in these figures we include the tail approx- 
imations as proposed by Sohraby and Ishizaki et al. The source parameters 
for Figure 1 are as follows: N = 60, Ta = 60 cells, T, = 600 cells, p = 0.3 
(hence, M = 18.2). For Figure 2 the only change is that the number of sources 
is now 24 (hence, M = 7.3). From these two figures we see that the proposed 
tail approximately follows the G/G/1 exact tail in the buffer range indicated. 
Ishizaki et al. tail approximation does not differ much from the proposed tail. 
Sohraby tail approximation is very conservative compared to the all other 
tail estimations. Results with large buffer size have also been done and the 
proposed tail showed to be very accurate. 
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5 VALIDITY OF USING THE TAIL DISTRIBUTION AS AN 
ESTIMATE OF THE CELL LOSS PROBABILITY OF FINITE 
BUFFER 

Usually the tail of the infinite buffer is used to approximate the CLP of the 
respective finite buffer queueing system. The question to be answered is how 
much accurate is that approximation (Belhaj et al. 1997)? The well known 
idea is that the tail distribution Ptaii = Pr{Q > B) of an infinite buffer 
configuration is always an upper bound to the respective finite buffer one, 
Pioss^ see for example (Mitrou et al. 1996). On the other hand it is stated in 
(Bisdikian et ai, Mignault et al. 1996) that the above statement is true only 
for heavy traffic cases. Therefore the relation between the tail probability and 
CLP of the corresponding finite buffer should be investigated. We checked 
this matter by comparing the exact tail and other tail approximations with 
the exact CLP obtained by simulation. 

In Figure 3 we give an example from a set of numerical examples that com- 
pare the cell loss approximated with the exact tail evaluated using AMS exact 
tail with simulation for different number of sources (i.e. load). We see clearly 
that the tail overestimates the cell loss at high loads and underestimates it at 
small loads. 

A contradictory result is obtained in Figure 4 (the source parameter is same 
with Figure 2) where the tail overestimates the cell loss even the load is very 
small (p = 0.3). This result indicates that not only the load that affects the 
accuracy of the tail as an estimate for the cell loss probability, there may be 
other parameters which have also some effect. This matter will be investigated 
more in section 6. 
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Figure 3 Comparison of the exact 
CLP produced by simulation and the 
exact tail overflow probability. Image 
source, M = 15.1, R = 100 



Figure 4 The tail overestimates the 
cell loss probability even with small 
load, p = 0.3 
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6 CELL LOSS PROBABILITY OF FINITE BUFFER SYSTEM 



To establish an accurate estimate of the cell loss probability of a finite buffer 
we start with the derivation of a Correction Factor (CF) to adjust the tail ap- 
proximation that has been developed. Therefore, we write the CLP estimate, 
Piossj as follows: 



Ploss=CF{.)-Ptail. (17) 

where CF(.) is a general function of the source and network parameters. 

Since we want to relate the infinite buffer analysis to the respective finite 
buffer, we start with the simplest situation that is well known in the literature, 
namely the Markovian M/M/1 and M/M/l/B systems, where the results 
governing them are well established. 

We denote the correction factor that relates the infinite queue to the cor- 
responding finite queue for the case of Markovian queues by Basic Correction 
Factor, CFsasic, which can be approximated, for large buffer size, as follows: 

CFBasic = (18) 

P 

In (Bisdikian et ai 1993), a similar result is provided also for M/G/l/A, 
and /D/l/K queuing models. The basic correction factor given by equa- 
tion (18) is used in (Bruneel et al. 1996, Mignault et at. 1996) as the correction 
factor to adjust the tail. On the other hand, Takine et al. (Takine et al 1994) 
proved that the CLP for discrete finite buffer is related to the tail probability 
as : 



_ (1 -p) Ptail 

P [1-PtaiiY 



(19) 



For small Ptaii , equation (19) provides the same correction factor as CFsasic^ 



6.1 Proposed Correction Factor 

To have some insight to the effect of all parameters on the correction factor we 
investigated the effect of the source and network parameters on the accuracy 
of the CLP estimation. We considered, burst length, burstiness, load, buffer 
size, and ratio of service rate to the peak rate of the sources. In all situations 
we investigated the effect of each parameter on the CLP as estimated from 
the tail modified by CF Basic compared to the simulation results. 

In Figure 5 to Figure 8 we plot the cell loss probability using the tail 
approximation which is adjusted by CFsasU compared to simulation. In these 
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Figure 5 Effect of burst length on the Figure 6 Effect of burst iness on CLP 
CLP using CFsasic- Videotel sources, using CFsasic- Videotel sources, M = 
M = 20, p = 0.63 20, p = 0.88 




Figure 7 Effect of the ratio of service Figure 8 Effect of load on CLP us- 
rate to the source peak rate on CLP ing CFsasic- Videotel sources, M = 20, 
using CFsasic- Voice sources, M = 40, B = 100 
p = 0.SS 

figures we checked the effect of burst length, burst iness, ratio of service rate to 
the peak rate and load. In each case the other parameters are kept fixed. With 
a careful investigation of these figures we see that for the case of burst length 
and burstiness the CLP estimate is approximately accurate. This means that 
the burst length and burstiness do not have a clear effect on the correction 
factor. On the other hand, for the load and ratio of service rate to peak rate 
ratio we see clearly that the cell loss estimate deviates from simulation as the 
parameter changes. From this we concluded that the load and the ratio of 
service rate to source peak rate both have some effect on the correction factor 
and therefore they should be put under more investigation. 

After a wide number of trials we arrived to a conclusion that a suitable 
correction factor can be put in the form (1 — /S.bp, where f{p) is a 

general function of the load. Refer to (Belhaj 1998) for a detailed derivation 
of this correction factor. 

In order to get accurate estimate of the CLP, the exact values of f{p) must 
be determined for different source and network parameters. One way is to 
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Figure 9 Collected data for f(p) ver- Figure 10 The proposed correction 
sus load factor as a function of M for different 

load values 



use tabulated values for different cases. Using tabulated values would require 
a considerable amount of memory capacity and some decision rule to select 
the value of f(p) because all possible values could not be stored in the table. 
Another possibility is to calculate a large range of values of f(p) and fit the 
data describing it using some criterion. This approach is the one we want to 
follow. Using comparison with simulation we made numerical experiments of 
more than 30 settings. In all these experiments we found the value of f{p) 
that produces exact CLP estimate. The collected data is shown in Figure 9. 
To fit the data we made a linear regression (to simplify the calculation for 
the CLP) of the collected data and found that f(p) can be represented by a 
linear function of the load as follows: 

f{p) = 0.72p-M. (20) 

Beside the function f(p) given by equation (20), which we call average f{p) 
we presented two other estimates of f{p). The first we call lower bound /(p), 
given by f{p) = 0.72p — 0.12, which is in tangent to the lower values of 
the collected data. The other proposal of f{p) we call it upper bound f[p) 
given by f{p) = .72p, which is in tangent to the upper values of the collected 
data. By comparison with simulation we found that using either the average 
f{p) or the other two estimations does not change the cell loss estimation 
significantly. That is from the network dimensioning point of view using any 
of the estimations of f(p) will give same results approximately. 

Therefore, we can write the overall Proposed Correction Factor, CFproposedi 
as follows: 

C Fproposed — ~ p) ’ ^ ) (^1) 

where f{p) is given by equation (20) or one of the other two proposals. 

By a closer look to into the developed correction factor as given by equation 
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(21) we could reach a conclusion that explains the relation of the tail and the 
CLP. Here, we have the result which necessities the correction to the statement 
that says the tail underestimates the loss for small load and overestimates it for 
high load as it has been emphasised by Bisdikian et al (Bisdikian et aL 1993) 
and Mignault et al. (Mignault et al. 1996). We have now the result that states: 
the tail may overestimate or underestimate the cell loss probability depending 
on the load and the ratio of the service rate to the source peak rate. For small 
values of load the tail usually underestimates the cell loss, and for high values 
of the load the tail usually overestimates the cell loss, but for moderate loads 
the tail may overestimate or underestimate the cell loss depending on the ratio 
of the service rate to the peak rate of the sources. See (Belhaj 1998) for more 
explanation of this result. Finally, according to the correction factor given by 
equation (21), the overall estimate of cell loss probability is given by: 

Floss — 0- ~ p) * ^ ' Ft ail- (2^) 

7 SIMULATION AND NUMERICAL EVALUATIONS 

In this section we give more results from the simulation and numerical eval- 
uations that have been made to test the accuracy of the proposed estimate 
of the cell loss probability as given by equation (22). In this investigation, 
we compare with simulation, the proposed algorithm and four other on-line 
algorithms proposed in the literature. These are the algorithm proposed by 
Sohraby which is based on an approximate of the tail of a discrete time queu- 
ing model (Sohraby 1993), the algorithm of Ishizaki et al. which is based on 
the evaluation of the CLP using discrete time model (Ishizaki et al. 1995) , 
the algorithm of Lee and Mark which is based on an approximate tail derived 
using fluid flow model (Lee et al. 1995), and the bufferless approximation 
model as proposed by Hsu and Warland which is based on large deviation 
approximation (Hsu et al. 1996). 

First we start with voice sources which are mainly characterised by the 
small peak rate and small burst length. In Figure 11 and Figure 12 we show 
the CLP versus buffer size and number of sources respectively. In these two 
figures the ratio of service rate to the source peak rate is 40. To check the 
results at higher values of M we present Figure 13 where M = 156. To test 
the accuracy of the algorithms when changing the parameter M, we include 
Figure 14, where the load is kept constant at 0.86 by changing the number 
of sources. From these figures related to voice sources we see clearly that 
the proposed algorithm is very accurate and predicts approximately the same 
results as that is given by simulation. 

The second set of experiments are done with videotel sources. In Figure 15 
and Figure 16 we present respectively the CLP versus buffer size and number 
of sources. Here, also with videotel sources we get very amazingly accurate 
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Figure 11 Cell loss probability versus Figure 12 Cell loss probability versus 
buffer size. Voice sources. M = 40, p = number of sources. Voice sources. B = 
0.7. 100, M = 40 





Figure 13 Cell loss probability versus 
buffer size. Voice sources, M = 156, p = 



0.82. 



Figure 14 Cell loss probability versus 
M. Voice sources. B = 100, p = 0.86. 



results using the proposed algorithm. Special look should be given to Fig- 
ure 15 where M is small, we see here that all other algorithms provide very 
conservative prediction of the cell loss probability. 

In the third set of experiments we consider image sources which are char- 





Figure 15 Cell loss probability versus Figure 16 Cell loss probability versus 
buffer size. Videotel sources, M = 5, number of videotel sources. B = 100, 
p = 0.6. M = 20 
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Figure 17 Cell loss probability versus 
buffer size. Image sources. M = 15, /? = 



0.58. 



Figure 18 Cell loss probability versus 
number of sources. Image sources. B = 
100, M = 15 



acterised mainly with the very high burstiness (^=23). In Figure 17 and Fig- 
ure 18 we see the CLP versus buffer size and number of sources respectively. 
From these figures we find that the proposed algorithm is also accurate. Other 
results for HDTV and MPEG are given in (Belhaj 1998). 



8 SERVING REAL-TIME TRAFFIC EFFICIENTLY 

The usual approach for the bandwidth assignment is that according to the 
state of the system each class is assigned a fixed bandwidth during the whole 
duration of the assignment period, and that bandwidth is made available to 
be used by the present connections of that class whatever the resulted QoS, 
i.e. the connections present of any class can use all the bandwidth assigned 
to that class. Now due to the idea of ATM, which is clear from the word 
‘‘Asynchronous” j the connections of any class that are present at any time may 
consume bandwidth much higher than that is required to satisfy their QoS 
if they allowed to do so. This happens whenever the number of connections 
in any class falls below the number that should be served by the assigned 
bandwidth. This is an other possible cause of the inefficient utilisation of the 
bandwidth assigned to any class. The ideal solution to this problem is to allow 
the present connections at any time consume just the amount of bandwidth 
assigned to them according to their QoS requirements. It may be argued 
that this can be achieved by making the bandwidth reassignment whenever a 
connection is terminated or a new connection is accepted. This is not suitable 
because it makes the assignment too frequent, so that the time from the 
previous assignment is not enough to make the necessary calculations required 
for the overall bandwidth assignment operation. The scheme we propose to 
solve this problem is based on the idea to differentiate between the assigned 
bandwidth and the available bandwidth. 

• The Assigned Bandwidth 
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We mean by the assigned bandwidth, the maximum bandwidth that the 
connections of a certain class can use during the assignment period. This 
bandwidth can not be assigned to any other class during this assignment 
period. The assigned bandwidth is determined according to an optimisation 
of some cost function. A connection will be accepted only if the assigned 
bandwidth is enough to serve this new connection with the present connec- 
tions of the class, i.e the cissigned bandwidth determines the CAC action. 
The assigned bandwidth is updated at the start of a new assignment period. 

• The Available Bandwidth: 

By the available bandwidth we mean the bandwidth that can be used by 
the present connections of the class, and this should be just the necessary 
bandwidth to satisfy the required performance. The available bandwidth 
is determined when ever a new connection requests admission or a con- 
nection terminates. Therefore it is updated according to the connection 
activity (belhaj 1998). When a connection is accepted (according to the 
assigned bandwidth) the available bandwidth is increased by the amount 
just to satisfy all the present connections of the class. When a connection 
terminates the available bandwidth is re-evaluated accordingly. Therefore, 
connections of each class will not be served with over satisfied QoS. The 
difference between the assigned bandwidth and the available bandwidth is 
made available to be used by the best-effort traffic, and hence bandwidth 
utilisation is maximised. 



9 NUMERICAL EVALUATIONS 





Figure 19 Cell loss probability versus Figure 20 Bandwidth saving versus 
number of connections less than 50 number of connections less than 50 

Here we give an example to see how much the dynamics of the connections 
affect the CLP that the connections are served with. We assume that different 
classes have e = 10"*^. We assign a bandwidth to each class such 50 connec- 
tions are served by the required CLP. Then we keep the assigned bandwidth 
constant and decrease the number of connections present in each class. For 
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the data source the parameters are, Rp = 10M6ps, ^ = 10 and Ta = 330. 
In Figure 19 we plot the CLP that the connection are served with versus the 
number of connections less than 50. We see clearly that the connections are 
served with over satisfied service and this increases as the connections are less 
bursty. For example if the number of videotel connections is decreased by 6 
then the present connections will be served with 10“^® instead of 10“^. In 
Figure 20 we see the saved bandwidth for the same situation as in Figure 19. 



10 CONCLUSIONS 

In this paper we introduced a very simple CAC that is suitable to provide 
guaranteed QoS for real time traffic. Scheduling disciplines other than FIFO 
together with CAC can provide the required performance. Peak rate alloca- 
tion although is very simple it considerably limits the number of real time 
traffic connections. This may not be desirable for the network provider be- 
cause real time traffic is expected to provide more revenue than best effort 
traffic. Therefore, bandwidth allocation for real time traffic should be based 
on algorithms that take into account the effect of statistical multiplexing for 
small buffer sizes. In this paper we proposed such a simple and efficient al- 
gorithm for CAC. Using simulation and numerical evaluations we compare 
several algorithms available in the literature and validate the accuracy of the 
proposed algorithm. 
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Abstract 

Three reservation algorithms for the limited-availability system are proposed and 
compared in the paper. These algorithms can be used for call blocking 
equalisation in outgoing links of multiservice switching networks. Two 
approximate methods of equalised blocking probability calculation in the system 
with limited availability are proposed. Results of analytical calculations are 
compared with results of digital simulation of limited availability groups and 
switching networks with reservation. These researches have confirmed a great 
accuracy of the proposed calculation methods. The formulae derived by the 
authors can be useful for the analysis and design of ISDN and B-ISDN systems. 

Keywords 

Bandwidth reservation, blocking probability, multi-rate model, switching network. 




1 INTRODUCTION 



Basic problems associated with the description of future B-ISDN (Broadband 
Integrated Services Digital Network) systems result from the necessity of servicing 
various types of traffic sources by the network. In principle, the classification of 
traffic sources in a broadband network is reduced to distinguishing the CBR 
(Constant Bit Rate) sources and the VBR (Variable Bit Rate) sources. 

To define the loads introduced into networks by the VBR sources, it is proposed 
to determine the so-called equivalent bandwidth for particular classes of traffic 
streams generated by the sources (COST 224, 1992), (COST 242, 1996). The 
assignment of several constant bit rates to the VBR sources enables the evaluation 
of traffic characteristics of switching systems in the B-ISDN network by means of 
multi-rate models worked out for the multi-rate circuit switching. The multi-rate 
system services independent call demands with an integer number of basic 
bandwidth units. In circuit-switched network, the bandwidth unit is well defined 
as time-slot or channel. In (COST 242, 1996), (Bean, 1994), (Kawashima, 1986), 
(Komer, 1989), (Lindberger, 1987), (Roberts, 1983), (Takagi, 1988), (Theberge, 
1995), (Tran-Gia, 1993) the multi-rate models are used to calculate the blocking 
probability in the full-availability group with reservation. 

In the B-ISDN network the reservation mechanism is combined with the CAC 
function (Call Admission Control) (COST 224, 1992), (COST 242, 1996). The 
effectiveness of CAC function depends on the adopted access control strategy for 
different calls. One of the possible strategies is bandwidth reservation that assures 
maximum equalisation of blocking probability in a system for all streams of 
offered traffic (Roberts, 1983), (Tran-Gia, 1993). 

The full-availability group is a discrete link model that uses complete sharing 
policy. In (Kaufman, 1981) and (Roberts, 1981) it has been proved that the multi- 
dimensional service process occurring in the full-availability group can be reduced 
to the one-dimensional Markov chain. Such reduction is the base for the 
determination of occupancy distribution in the group by means of a simple 
recurrent formula which is known as the Kaufman-Roberts recursion. 

The calculation algorithm of more complicated systems consists in the 
approximation of a multi-dimensional service process by the one-dimensional 
Markov chain characterised by a product form solution (Beshai, 1988), (Roberts, 
1983), (Stasiak, 1993a), (Stasiak, 1993b). Such approach leads to a simple 
formula for recurrence calculation of the occupancy distribution in multi-rate 
systems. This formula is a generalisation of the Kaufman-Roberts recursion. 

In Section 2 the generalised Kaufman-Roberts recursion has been analysed. The 
full-availability group with reservation has been described in Section 3. In Section 
4 a model of the limited-availability group with reservation has been considered. 
Limited-availability groups were the subject of many professional analyses, e.g. 
(Conradt, 1985), (Karlsson, 1991), (Button, 1984), (Ramaswami, 1985). In 
(Stasiak, 1993b), a simple approximate method of calculating the blocking 
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probability in these groups is quoted. Solutions associated with reservation 
algorithms in the limited-availability group have not been proposed. The problem 
of introducing the reservation mechanism to this group is important. The outgoing 
links of switching networks can be treated as limited-availability groups (Stasiak, 
1996). Thus, solutions concerning these groups are relevant for determining the 
conditions of blocking probability equalisation in switching networks. 

The main aim of the present paper was to elaborate reservation algorithms for 
the limited-availability group. In Section 4 three reservation algorithms in the 
limited-availability group have been proposed. These algorithms have ensured 
blocking equalisation in the system for all traffic streams. 

In Section 4 two approximate methods of blocking probability calculation in the 
limited-availability group with reservation have been derived. The results of 
analytical calculations have been compared with the results of digital simulation. 
In Section 5 problems of external blocking probability equalisation in switching 
networks have been discussed. Section 6 concludes the paper. 



2. GENERALISED KAUFMAN - ROBERTS MODEL OF THE 
MULTI - RATE SYSTEM 

Let us consider a multi-rate system with a capacity of V bandwidth units. The 
system services M independent classes of Poisson traffic streams with the 
intensities: A 2 , ..., . The holding time for the calls of particular classes has 

an exponential distribution with the parameters: •’ Mm* ^ 

requires t - bandwidth units to set up a connection. Thus the mean traffic offered to 
the system by the class / traffic stream is equal to: 

a,=A,//i, (1) 

2.1. Basic recurrence relations 

The state of the system is determined by an ordered set: Q = {x^,. . Xj^}. Each 

of those elements is equal to the number of given-class calls carried by the system. 
The probability of this state is designated by the symbol p{x ^ ). The total 

number of busy bandwidth units in the system is equal to: 

M 

( 2 ) 

i=\ 

The probability of « basic bandwidth units being busy is denoted by the symbol 
P(n). Figure 1 shows a fragment of the multi-dimensional process occurring in the 



360 





Figure 1 Multi-dimensional Markov process in the multi-rate system. 

system under consideration. Let us consider the local equation of equilibrium 
associated with class / stream (the states designated with a dash line in Figure 1): 

where (T,(xi,. . ., x, - 1,. . ., is the conditional probability of passing between 
adjacent states of the process associated with the class / stream. The value of the 
parameter cr, can be changed, depending on the state of the system. So, various 
solution are possible for equation (3), and consequently, the local birth and death 
processes associated with the relevant streams of calls will be mutually dependent. 
In (Beshai, 1988), (Stasiak, 1993a), (Stasiak 1993b) it was assumed that those 
dependencies are negligible when the following assumptions are fulfilled: 



CT,(xi,...,x^ ) = Gi{n) (4) 




The assumption (4) means that c, {n) does not depend on the division of the 
busy units between particular classes of calls. The assumption (5) means that 
<7- (n) is a slowly-varying fiinction of n. Under such circumstances we can assume 
(Beshai, 1988) that the mutual dependence of service processes of particular 
classes is negligible. The assumptions (4) and (5) allow us to sum unconditionally 
all the M type (3) equations for the state Q = {x ^ } (Figure 1). Regarding 
the formula (1) as the result of such procedure we obtain: 

M 

i=\ 
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02 ^2 ^2 (^~^) 



02 t2 02 



02 ^2 ^2 




Now, summing both sides of (6) for all sets Q satisfying the equation (2) we 
finally obtain the so-called generalised Kaufman-Roberts recursion (Beshai, 
1988), (Roberts, 1983), (Stasiak, 1993a): 

M 

nP{n) = ^ Uiti Oi (n - r,- ) P(n - r,- ) (7) 

t=i 

It has been mentioned that P(n) is the state probability in the multi-rate system, 
and <Ji (n) is the conditional (state-dependent) probability of passing between the 
adjacent states of the process associated with the class / call stream. Thus, the 
blocking probability b(i) for the class ; stream can be written as follows: 

v-ti V 

b(i) = X P(rj) [1 - (T, (n)] + X P(n) (8) 

«=0 n=F-f^+l 



On the basis of the equations (7) - (8) we can approximately calculate the blocking 
probability in a state-dependent multi-rate system. However, for this purpose the 
probabilities <Tj (n) have to be determined. It should be emphasised that the 
accuracy of calculating the distribution (7) depends on the level of accordance 
between the determined parameter cr, (n) and the assumptions (4) and (5). 

It should be mentioned here that the presented analytical approximation (7) of 
the occupancy distribution in a multi-rate system results from the reduction of the 
multi-dimensional Markov process in the multi-rate system (Figure 1) to the one 
dimensional approximate Markov chain (Figure!). The diagram presented in 
Figure 2 is appropriate to the generalised Kaufman-Roberts recursion (7) for the 
system with two call streams (M=2, t^=i, t2=2). The y, (n) symbol denotes the 
reverse transition rates of a class / service stream outgoing from state n. 

If the probabilities of passing o-(n) are equal to one for all states, equations (7) 
are reduced to the Kaufinan-Roberts recursion (Kaufman, 1981), (Roberts, 1981): 
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( 9 ) 



M 
/=! 

Equation (9) determines exactly the occupancy distribution in the full-availability 
group with different multi-rate traffic streams. The full-availability group is a 
discrete link model that uses complete sharing policy (COST 242, 1996). This 
system is an example of a state-independent system in which the passage between 
two adjacent states of the process associated with a given class stream does not 
depend on the number of busy bandwidth units in the system. 

2.2. Reverse transition rates 

A part of the one-dimensional Markov chain diagram constructed for two call 
streams according to equation (7) is shown in Figure 2. The y, (n) symbol denotes 
the reverse transition rates of a class i service stream outgoing from state n. These 
transition rates are equal to the average number of class / calls serviced in state n. 
Each state of the one-dimensional Markov chain in the multi-rate system 
(Figure 2) satisfies the following state equation: 

M M M M 

(") + S ~ U) P{n - + t^y^in + t^)P{n + 1 , ) 

1 = 1 1 = 1 Z = 1 Z = 1 

( 10 ) 

From equation (7) it results that the sum of service streams outgoing from a state 
n is equal to n: 

M 

n = '^tiyi(n) ( 11 ) 

1 = 1 

According to formulae (7) and (1 1), equation (10) can be rewritten as follows: 

M M 

X <^.(”) P («) =X^i yi(n + ti) P(n + ti) (12) 

i=\ Z=1 

Expression (12) is the equation of statistical equilibrium between the total stream 
outgoing from state n towards higher states and the total service stream entering 
state n from higher states. This equation holds only when the local balance 
equations for call streams of particular traffic classes are satisfied (Kaufman, 
1981), (Stasiak, 1993a), (Stasiak, 1993b): 

a^t. CT, {n) Pin) = y. (n + ) P(n + ) (13) 
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Based on equation (13), the reverse transition rates are equal to: 

k c7,(/?)P(«)/P(/7 + r,) for n + t^<V 

[O for n-¥t^>V 



( 14 ) 



Formula (14) determines the average number of class i calls serviced in the state 
n+t- of the multi-rate system. For cTj(w) = l formula (14) designates the reverse 
transition rates in the fiill-availability group. 

2.3. The full-availability group with reservation 

Let us consider a full-availability group with bandwidth reservation. This system 
allows us to equalise the blocking probability for all classes of traffic streams. For 
this purpose, the reservation threshold 0- for each traffic class is designated. The 
parameter 0 • determines the borderline state of a system, in which servicing class 
i calls is still possible. All states higher than belong to the so called reservation 
space R- , in which class / calls will be blocked: 

Ri=V-Qi (15) 



According to the equalisation rule (COST 242, 1996), (Roberts, 1983), (Tran- 
Gia, 1993), the blocking probability in the fiill-availability group will be the same 
for all call stream classes if the reservation threshold for all traffic classes is 
identical and equal to the difference between the total capacity of a group and the 
value of resources required by the call of maximum demands /^ax: 

( 16 ) 

The method of blocking probability calculation in the fiill-availability group with 
reservation, proposed in (Roberts, 1983), determines state probabilities by the 
recursion (7), in which the conditional probabilities a^{n) are equal: 



C7,(«) = 



Jo for n>Q 
[l for n<Q 



(17) 



The equalised blocking probability for all call classes can be calculated as follows: 

b{i)= ( 18 ) 

n=Q+\ 
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Figure 3 Markov chain conforming to full-availability group with reservation. 

A diagram of the one-dimensional Markov chain according to the generalised 
Kaufman-Roberts recursion (7) with the probabilities of passing determined by 
equation (17) is shown in Figure 3. The diagram is appropriate to the fiill- 
availability group with reservation. This group is offered two classes of traffic 
streams with bandwidth requirements: ^^=1, and t 2 = 2 . The reverse transition rates 
are not marked in Figure 3 because these parameters are not necessary for the 
occupancy distribution calculation with the help of the equation (7). The 
calculation method of the reverse transitions has been presented in Section 2.2. 



4. THE LIMITED-AVAILABILITY GROUP WITH RESERVATION 
4.1. Limited-availability group model 

The limited-availability group is a group divided into identical subgroups. The 
system services a call only when this call can be entirely carried by the resources 
of an arbitrary single subgroup. Let us consider a limited-availability group 
characterised by features of the following structural parameters (see Figure 4): 
k - number of subgroups in a group, / - capacity of a subgroup (the number of 
basic bandwidth units in a subgroup), V - total capacity of a group (V^ kj). 



1 2 k 




Figure 4 Limited- availability group 

In (Stasiak, 1993b) a simple approximate method of calculating the blocking 
probability in these groups is proposed. According to this method, the state 
probabilities in the limited-availability group is approximated by the generalised 
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Kaufman-Roberts recursion (7). The probability of passing ay(/?) in recursion (7) 
can be calculated as follows: 

(7,{n) = [F{V - n, k, f) - F{V - n, k, t, - 1)] / F{V - n, k, f) (19) 

where F(x,k,J) is the number of arrangements of x free bandwidth units in k 
subgroups, calculated with the assumption that the capacity of each subgroup is 
limited to /bandwidth units. The value of parameter F{xXf) can be calculated by 
the following combinatorial formula: 



L /+1 

F{xXf)= X ("O' 



k-\ 






( 20 ) 



The accuracy of the calculation of the occupancy distribution in the limited- 
availability group (formulae (7) and (19)) depends on the degree of accordance of 
passing probabilities a/w) with the assumptions (4) and (5). The simulation tests 

(Stasiak, 1993b) carried out for various structures of limited-availability groups 
and for different mixtures of multi-rate traffic streams have confirmed great 
calculation accuracy of the formulae (7), (19) and (20). In practice, when 
/ > (where is the value of resources, required by a call of maximum 
demands), the results of calculations made in accordance with the mentioned 
formulae to determine the occupancy distribution in the limited-availability group 
can be regarded as accurate. 

4.2. Reservation algorithms in limited-availability group 

The blocking equalisation rule (16) does not take effect in the case of the limited- 
availability group because of its structure and state-dependent processes occurring 
in the system. In this Section three reservation algorithms in the limited- 
availability group are proposed. These algorithms make it possible to equalise well 
the blocking probabilities for all classes of offered traffic. 

Algorithm I 

In this algorithm we introduce the reservation threshold Q for all call classes 
except for the oldest class M (i.e. the one which requires the greatest number of 
bandwidth units to set up a connection: )• means that only class M 

calls can be engaged by the system in the states belonging to the reservation space 
R^. Thus the probability c^{n) in recursion (7) can be expressed as follows: 
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3; ff CTf (Q-t) a^ ti Of fOJ 



Figure 5 Markov chain in limited-availability group (reservation algorithm I). 



i m 



( ^ _\[F{^ -nXf)-F(V -n,k,ti-\)]/ F{V -n,k,f) for n 
” I 0 for n 



>Q 



( 21 ) 



i = m => 

£7, («) = [F{V - n, k, f) - F{y - n, k. ?, -\)]l F{V - n, k, f) for each n (22) 

The Oi parameters in the states which do not belong to the reservation space are 
calculated in the same way as in the case of the limited-availability group without 
reservation. In the reservation space the conditional probabilities of passing are 
equal to zero, except for the oldest class stream. Thus, the blocking probability for 
class M calls can be determined directly by (8). The blocking probability for class 
i (i ^M) calls can be calculated as follows: 

V-Q V 

Hi) = X P(n) [1 - CT; (n)] + X P(n) (23) 

«=() «=y-j2+i 



A diagram of the Markov process according to the recurrence equation (7) for 
the limited-availability group with two offered call streams = ^ 2 = 2 ) is shown 

in Figure 5. This diagram is appropriate for the reservation algorithm under 
consideration. The problem is to find the value of reservation threshold Q in 
which the blocking probability of all traffic streams is equalised. The solution can 
be obtained by means of iterative method, according to the following plan. We 
accept the group capacity V as the first value of reservation threshold Q. Then, on 
the base of the assumed Q value, we calculate the occupancy distribution 
(equations (7), (21) and (22)) and the blocking probability for all traffic streams 
(equations (8) and (23)). Finally, we calculate the|B(/)- i9(7')| value for each pair 

of traffic streams /, j. If |b(/) - j)|/ B{i) < ^ (where ^ is the value of the assumed 

relative error), Q is the value of reservation threshold which causes blocking 
equalisation. Otherwise the value of the Q parameter is decreased by one, and the 
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calculation cycle is repeated. The number of steps is equal to the value of the 
reservation threshold. 

In Figure 7, calculation results are compared with digital-simulation results for 
a limited-availability group characterised by parameters F=120, ^=4 and /=30. In 
this group three classes of multi-rate traffic are offered: =48 Erl, = 

(<32=24 Erl, t 2 =l) and (^3=8 Erl, ^3=6). The computational time for one Q value 

does not exceed I sec. (PC Pentium 133). The simulation results of one Q value 
have been obtained for five series with 100 000 arrivals of class 3 in each of them. 
An estimated simulation time has taken about 20 min. The simulation was carried 
out for a random and sequential hunting strategy of subgroups. 

The simulation results are obtained with 95% confidence interval. The values of 
that interval are not plotted on the diagram for better clarity. However, for each 
case the interval is at least one order smaller than the simulation result. All the 
results are expressed in relation to the value of reservation space R. 

In view of our research, it can be stated that for R=A , the reservation algorithm 
in the limited-availability group brings about the blocking probability equalisation 
for the first (f|=l) and the second (^2=2) class of traffic. The blocking probability 

^(3) of a call stream of highest demands (i.e. the one that requires the greatest 
number of bandwidth units to set up a connection) decreases when the value of R 
increases. This phenomenon is caused by servicing only the third class calls in the 
reservation space. Thus the system has fewer remaining free bandwidth units for 
streams of lower demands (the first and the second class streams). 

This is the reason why the blocking probability for streams of lower demands 
increases (^(1), ^(2)). The blocking probabilities of all traffic classes are equalised 
for 7?=I1. Above this value, the blocking probability for the third class stream 
exceeds the blocking probability for streams of lower demands. 

Algorithm 11 

In this algorithms we introduce the reservation threshold Q for all classes of call 
streams. This means that calls of all classes can not be engaged by the system in 
the states belonging to the reservation space R. Thus, the conditional passage 
probabilities in the recursion (7) can be expressed as follows: 

for n<Q 

' [ 0 for n>Q 

The Oj parameters in the states which do not belong to the reservation space are 

calculated in the same way as in the case of the limited-availability group without 
reservation. In the reservation space the probabilities of passing are equal to zero. 
The blocking probability for all call classes can be calculated by the formula (23). 
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Figure 6 Markov chain in limited-availability group (reservation algorithm II). 

A diagram of Markov process in a limited-availability group carrying two call 
streams is shown in Figure 6. This diagram is appropriate to the reservation 
algorithm under consideration. The reservation threshold Q inducing the blocking 
probability equalisation can be found in the same iterative way as in the case of 
algorithm I described above. 

In Figure 8 calculation results are compared with digital simulation results for a 
limited-availability group with the parameters determined above: F=120, k=4, 
f=30, ^7^ =48 Erl, t^=l, a2=24Er\, t2^2, a^=SEr\ and t^=6. With respect to our 
research, it can be stated that for R>4, algorithm II leads to blocking probability 
equalisation for two lower call streams (fi=l, t2""2). Blocking equalisation for all 
call classes takes places for R> 15, which is greater reservation space than in the 
case of algorithm I. The calls of the highest demands are not serviced in the 
reservation space. As a result of the above the blocking probability for the third 
class stream decreases slower than in the case of algorithm I. In the algorithm I, 
the calls of the third class are serviced in the reservation space. This means that 
the system which uses algorithm I is 'Ijroadened" for the third class calls. The 
inaccessibility of the reservation space in the case of algorithm II induces the 
system to become more and more "narrowed" for all class calls when the R 
parameter increases. Due to this phenomenon, the value of equalised blocking 
probability increases slowly for /^> 15. 

Algorithm III 

It is possible to determine other reservation algorithms in the limited-availability 
group. Let us consider one of them. The algorithm proposed assumes designation 
of the reservation threshold for all traffic streams equal to g = /-l^ax ^ 

certain number of subgroups. Thus, in the algorithm proposed we can distinguish 
the subgroups (from among k subgroups) in which the reservation mechanism is 
introduced. The simulation results of the limited-availability group characterised 
by the parameters determined above (F=120, k=4, /=30, aj=48Erl, /j=l, 
02=^24 Erl, t2=2, Erl, ^3=6) are shown in Figure 9. The results are expressed 
in relation to index j. This index determines the number of subgroups in which 
the reservation threshold is established. The reservation mechanism introduced 
into several subgroups causes the system to service more third class calls, and the 
blocking probability b{3) decreases. Thus, the first and the second classes receive 
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Figure 7 Blocking probability in the limited-availability group (algorithm I). 
Calculations: , Simulation: x random strategy, o sequential strategy. 




Figure 8 Blocking probability in the limited-availability group (algorithm II). 
Calculations: , Simulation: x random strategy, o sequential strategy. 




0.01 A ^ ^ ^ 1 ^ ^ i 1 ^ ^ ^ 1 ^ 

0 12 3 4 

Figure 9 Blocking probability in the limited - availability group (algorithm III). 
Simulation: ^ random hunting strategy. 
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fewer free bandwidth units, and the blocking probabilities /?(!) as well as b{l) are 
increased If reservation thresholds are introduced into all k subgroups, the 
blocking probability of all traffic classes is equalised. 

The elaboration of a simple method of the calculation of the blocking probability 
in the limited-availability group operating with reservation algorithm III is not 
possible. Contrary to the reservation algorithms I and II, the introduction of 
reservation threshold into several subgroups does not mean that the conditional 
probabilities of passing a;(/7) take the value of zero in the reservation space R. 
These parameters are different from zero and from the values detennined by 
equation (19). This phenomenon results from the possibility of carrying call 
streams by the subgroups in which the reservation mechanism does not exist. 
Thus, the diagram of Markov process shown in Figure 2 is appropriate to the 
reservation algorithm under consideration. 

The simulation research carried out by the authors indicate that in the case of 
introducing the reservation threshold into each subgroup the equalised blocking 
probability for the algorithm III can be approximated by the method developed for 
algorithm II (for the smallest value of reservation space /?, in which blocking 
probabilities are equalised). For the majority of cases, such approximation does 
not include an error exceeding 5%. 

From an engineer’s viewpoint the algorithm III can be considered as the most 
effective one. In ATM network, tlie CAC function tests a possibility of setting up a 
new connection at designated links or virtual paths. Thus, it seems to be 
technically sound to joint tlie reservation mechanism witli tlie CAC function 
which checks tlie state of a given link or virtual patli. Moreover, tliis solution 
allows tlie network operator to introduce tlie reservation mechanism into 
particular ATM links in accordance witli the network configuration. 



5 . SWITCHING NETWORKS WITH RESERVATION 

The third reservation algorithm can have a practical meaning for external 
blocking probability equalisation in multi-service switching networks. The 
outgoing link groups of switching networks can be treated as limited-availability 
group, in which particular links are regarded as subgroups (Stasiak, 1996). 
Figure 10 shows a three-stage switching network. The outgoing multiplexed 
transmission links create link groups called directions. The outgoing links can be 
wired to the directions in different ways. In Figure 10 each first link of each last- 
stage switch belongs to the first direction. Analogously, each link n of each last- 
stage switch belongs to the same direction with serial number equal to n. 

Blocking events in the switching network are the sum of internal and external 
blocking events. An internal blocking event is defined as the impossibility of 
setting up a connection between a given input and output link (or direction) of a 
switching network. In turn, external blocking event for class / stream appears 
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Figure 10 A three-stage switching network. 

when no link of the given direction can carry a class / call. In this paper only 
external blocking events are considered. 

On the grounds of the limited-availability group definition (Section 4.1), it can 
be stated that the direction of the switching network can be regarded as the 
limited-availability group if only one link chosen arbitrarily from among other 
links belonging to this direction can carry a given class call. According to the 
third reservation algorithm, the reservation of bandwidth units in each 

outgoing link of the direction leads to the equalisation of external blocking 
probabilities in this direction. 

Figure 1 1 shows the simulation results of the total blocking probability in the 
switching network without reservation for all offered traffic streams. The 
simulations were carried out for a three-stage switching network consisting of the 
digital switches ofkxk links. Each link has a capacity of /bandwidth units. The 
outgoing links create k identical directions, each with a capacity equal to k links 
(Figure 10). Thus, the total capacity of direction expressed in bandwidth units is 
equal io V = kf. The simulations have been made for a switching network with 
the parameters k = A and /= 30. In this network three classes of multi-rate traffic 
in the following proportions j. 722* «3 =6:3:1 are offered. The number of 

bandwidth units demanded for calls of particular classes are: t^=\, ^ 2 = 2 , 

The simulation results are obtained with 95% confidence interval. The results 
are expressed in relation to the value of traffic offered to a single basic bandwidth 
unit, given by the formula: 

M 

a=J^a,t,/V (25) 

/=1 

Figure 12 shows the simulation results of the blocking probability in the switching 
network considered under the assumption that the third reservation algorithm is 
introduced into particular directions of the switching network. In consequence, 
blocking probabilities for all class streams approach one another. In comparison 
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Figure 11 Blocking probability in the switching network without reservation. 
Simulation: x random hunting strategy, o sequential hunting strategy of links. 
Calculations (limited-availability group without reservation): ““ . 




Figure 12 Blocking probability in the switching network with reservation. 
Simulation: x random hunting strategy, o sequential hunting strategy of links. 
Calculations (limited-availability group with reservation, algorithm II): ““ . 



with Figure II, the blocking probability ^(3) of the call stream of the highest 
demands decreases. This phenomenon is caused by carrying only the third class 
calls in the reservation space of each outgoing link. Thus the direction has fewer 
remaining free bandwidth units for streams of lower demands (the first and the 
second class streams) and the blocking probabilities b(l) as well as b(2) for 
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streams of lower demands increases. The utilisation of the third algorithm in the 
outgoing directions of a switching network brings about the blocking probability 
equalisation for the first and the second class of traffic. Contrary to the limited- 
availability group, the introduction of the third reservation algorithm into the 
outgoing directions of a switching network does not lead to the blocking 
probability equalisation for all traffic streams. Slight differences between the 
blocking probability b{2) and the equalised probabilities b(l), b(2) (Figure 12) 
result from the internal blocking of the switching network. In the structure of the 
switching network under consideration the internal blocking probability is 
negligible. Generally, however, equalisation of total blocking probability requires 
introduction of the reservation mechanism into inter-stage links of switching 
networks. The problem of internal blocking probability equalisation in switching 
networks is not considered in this paper. 

To obtain a qualitative comparison of the limited-availability group with 
outgoing direction of switching network. Figure 1 1 and 12 show the results of 
blocking probability calculations in the limited-availability group with (and 
without) reservation. These calculation results were obtained on the basis of the 
methods discussed in Section 4.1 and Section 4.2. 

In view of our research, it can be stated that the switching network characterised 
by a negligible number of internal blocking events can be calculated on the basis 
of limited-availability group models with (or without ) reservation. 



6 CONCLUSION 

In the paper three reservation algorithms for the limited-availability group have 
been proposed. First two algorithms (I, II) assume independent strategies of 
subgroups occupation. They require one reservation mechanism for the system. 
The difference between algorithm I and II results from the determination of the 
reservation threshold for a traffic stream of the oldest class. In algorithm I this 
reservation threshold has not been introduced. In algorithm II, the reservation 
threshold for the oldest traffic class has been equal to the reservation threshold for 
other traffic classes. Algorithm III requires one reservation mechanism for each 
subgroup of the system. All algorithms proposed have led to the blocking 
probability equalisation in the limited-availability group. 

In the paper, two methods of equalised blocking probability calculations have 
been derived; one method used for algorithm I and the other for algorithm II. The 
simulation tests have confirmed the validity of all the assumptions used in 
proposed methods. Equalised blocking probability in the limited-availability group 
resulting from algorithm III (reservation threshold is introduced into each 
subgroup) can be approximated by the method developed for algorithm II. 

The proposed reservation algorithms can be useful for obtaining external 
blocking probabilities equalisation in the outgoing links of multi-service switching 
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networks. From an engineer’s viewpoint, algorithm III, assuming the introduction 
of the reservation mechanism into each subgroup, is the simplest one. 
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Abstract 

The focus of this paper is to investigate the effects of various traffic and switch 
characteristics on multiplexing gains and their implications for different band- 
width allocation and admission control algorithms. We show that the total 
multiplexing gain due to the independent combination of identical sources 
can be resolved into two factors, one expressing the advantage gained (by 
means of buffering) from the statistical rate variations within a source and 
the other expressing the efficiency of statistical multiplexing of i.i.d. streams 
(gain across sources) . Both simulation and theoretical analysis illustrate that 
although bursty sources require more bandwidth, multiplexing gains are in- 
creasing with burstiness. The effective bandwidth approach is found to work 
well in the region with high buffer size to burst length ratio, high source uti- 
lization and small number of sources, whereas the Gaussian approximation 
performs well in the region with small buffer size to burst length ratio, high 
source utilization and large number of sources. Finally, we also give some 
quantitative information related to self-similar traffic, i.e., FBM. It is shown 
that even for LRD traffic with high-H values, it is possible to obtain higher 
multiplexing gains when a large number of independent sources with the same 
Hurst parameter are multiplexed for combined transmission. 

Keywords 

ATM, multiplexing gains, long-range dependence, admission control 




1 Introduction 



The emerging high-speed asynchronous transfer mode (ATM) networks are ex- 
pected to support a wide range of telecommunication services such as voice, 
data, video and image transfer, with different traffic characteristics and qual- 
ity of service(QOS) requirements. In ATM, bandwidth allocation deals with 
determining the amount of bandwidth required by a connection for the net- 
work to provide the required QOS. There are two alternative approaches for 
bandwidth allocation: deterministic multiplexing and statistical multiplexing. 
In deterministic multiplexing, each connection is allocated its peak bandwidth. 
Doing so causes large amount of bandwidth to be wasted for bursty connec- 
tions, particularly for those with large peak-to- aver age bit rate ratios. This 
goes against the philosophy of the ATM framework since it does not take ad- 
vantage of the multiplexing capability of ATM and restricts the utilization of 
network resources. An alternative method is statistical multiplexing. In this 
scheme, a multiplexing gain is achieved as the capacity allocated to a group 
of bursty traffic streams is lower than the sum of their peak rates. Hence, 
statistical multiplexing allows more connections to be multiplexed in the net- 
work than deterministic multiplexing, thereby allowing better utilization of 
network resources. 

Since the resource allocation algorithms will be used by network control func- 
tions such as connection admission control and network routing, the real-time 
requirements necessitate that the complexity of these algorithms should be 
kept low while still taking into account the characteristics and the desired 
QOS of the connections. The exact solutions are either intractable or when 
available, are computationally too complex to meet the real-time require- 
ments. Therefore approximations have to be made. These resource allocation 
schemes can be divided into two main categories. The first category consists 
of those which take the buffering in the switches/multiplexers into account. 
However, in order to meet the real-time computation requirements, they fail 
to take into consideration the effects of statistical multiplexing across the 
sources sharing the buffer. A typical example of this kind of bandwidth allo- 
cation scheme is the well known effective bandwidth. The second category is 
of those which treat the switches/multiplexers as bufferless entities but take 
the statistical multiplexing between the sources into account. The Gaussian 
approximation approach falls into this category. The focus of this paper is to 
investigate the effects of various traffic characteristics and switch parameters 
on multiplexing gains and their implications for different bandwidth allocation 
and admission control algorithms. 

Previous work [15] [16] on this subject have adopted a simplified bufferless 
model and assumed that the traffic sources are of Gaussian distribution. Al- 
though in that case explicit formulas are available, insights into the relation- 
ship between multiplexing gains and various traffic and switch characteristics 
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seem to be restricted. In this paper, we show that the total multiplexing gain 
due to the independent combination of identical sources can be resolved into 
two factors, one expressing the advantage gained (by means of buffering) from 
the statistical rate variations within a source and the other expressing the effi- 
ciency of statistical multiplexing of i.i.d. streams (gain across sources). Both 
simulation and theoretical analysis illustrate that although bursty sources re- 
quire more bandwidth, multiplexing gains are increasing with burstiness. The 
effective bandwidth approach is found to work well in the region with high 
buffer size to burst length ratio, high source utilization and small number 
of sources, whereas the Gaussian approximation performs well in the region 
with small buffer size to burst length ratio, high source utilization and large 
number of sources. 



2 The Model 

Consider an ATM multiplexer queue with buffer size B fed by N i.i.d. ON / OFF 
fluid flow sources. For exponentially distributed ON and OFF periods, a 
source is completely characterized by three parameters, namely the peak rate 
R, the utilization p and the mean burst length b. Let x be the normalized 
buffer size with respect to burst length, i.e., x = B/b. This two-state fluid 
model has been chosen to capture traffic from diverse applications like variable 
bit rate (VBR) video, voice and data communications [13]. The multiplexing 
gain G, which is specified by the allowable QOS parameter, is the bandwidth 
saving due to statistical multiplexing over the case in which peak rate alloca- 
tion was to be used. It is defined as follows: 

G = NR/C, (1) 

where C is the link bandwidth needed to meet desired QOS (cell loss ratio, 
cell delay, jitter, etc) for the multiplexed stream of N sources. Here we use 
the buffer overflow probability (or cell loss probability) as the QOS parameter 
and denote it by e. 

The maximum possible value of gain is obtained when admission control is 
based on average bandwidth assignment, i.e., C = NRp. Therefore, the maxi- 
mum multiplexing gain G is given by G = 1/p. Intuitively, this means that for 
highly bursty traffic, with p < 1,G = 1/p can be quite large. However, this 
maximum gain cannot be attained in reality, because the average bandwidth 
assignment method is unacceptable in terms of cell loss. 
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3 Simulation and Analysis 



3.1 The Effective Bandwidth Approach 



Assume the notion of effective bandwidth is used to determine the bandwidth 
needed to meet desired QOS for a single source. It is denoted by e and satisfies 
m < e < where m = Rp is the mean rate of the source. Then G can be 
written as: 



e C 



( 2 ) 



where Gi = R/e is the statistical gain in a single source, and G 2 = Ne/C is 
the multiplexing gain across sources. 



To compute the effective bandwidth e, Guerin et ai. [7] propose a simple and 
straightforward method which is based on conservative estimates of the buffer 
overflow probability e. It includes the effect of the access buffer, but ignores 
the effect of statistical multiplexing between sources. The effective bandwidth 
for an individual connection is given as: 



e = (q(1 - p) - a; + V[a(l -p)~ + 4xap{l - p)) (3) 



where a = ln(l/e). For multiple sources, the same expression as in (3) can be 
used such that the following condition is satisfied: 



e = 




(4) 



Note that, in general, the effective bandwidth is found to provide a conser- 
vative estimate of the bandwidth requirements of various sources, especially 
when the number of sources is large. That is why G 2 in (2) is expected to 
be greater than 1. Yet it is a useful tool in many situations because of its 
additive property, as shown by (4) . More details on this method can be found 
in [7], 

Next we present and discuss a number of numerical examples that illustrate 
the relationship between multiplexing gains and various traffic parameters. 
The exact bandwidth C in (1) is computed by iteratively solving the differen- 
tial equations associated with the underlying queueing system [2]. There are a 
number of variables that directly impact the multiplexing gain of connections. 
We express Gi, G 2 and G as functions of the following parameters: the source 
utilization p or burstiness 1/p, the ratio of buffer size to burst length x, and 
the number of sources N. Without loss of generality, we assume R= 1. 
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Figure 1: Effect of source utilization on multiplexing gains. N = 50, x = 
7.5, € = 10“^. dashed line; Gi, dotted line: G 2 , solid line: G 



3.1.1 Effect of Source Utilization 



The first set of examples illustrate the effect of the source utilization on the 
multiplexing gains. In Figure 1 and Figure 2 we plot Gi , G 2 , G as a function of 
p varying from 0.02 to 0.8. The other parameters are as shown in the figures. 
Obviously all the gains are decreasing functions of p. When p I, i.e., all N 
connections have a constant bit rate R, all three gains tend to 1. It is easy 
to check that as p -> 1, the limit of e in (3) is R. However, the case p -> 0 
exhibits two possible limits, depending on the sign of the quantity {aR — x). 



lim e = 

p— >^0 



{ 



0, 



if aR < X 
if aR > X 



( 5 ) 



Intuitively, (5) states that as p — > 0, the effective bandwidth also goes to 0 
only if the buffer is large enough compared to the mean burst size, i.e., the 
buffer should be able to hold a bursts of average size. When the buffer size 
is not sufficient, the effective bandwidth has a nonzero limit since, although 
bursts are less and less frequent, the service rate must still handle large bursts 
whenever they arrive [7]. In Figure 1, since aR > x, Gi does not grow 
significantly as p 0. In Figure 2, since aR < x, we observe a sharp increase 
of Gi as p — ^ 0. 

G 2 increases with decreasing p, since the effective bandwidth approach be- 
comes more and more conservative. This can be readily explained from the 
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Figure 2: Effect of source utilization on multiplexing gains. N = 50, a: = 
8.0, e = 10“^. dashed line: Gi, dotted line: G 2 , solid line: G 



origin of effective bandwidth. It is well known that the probability that the 
buffer content exceeds the buffer size x^H{x)^ has the following asymptotic 
approximation [2]: 

H{x) ~ Dexp( 2 :oa:) = An^NpR/C)^ eyi^{zQx) (6) 

where An can be computed from all the negative eigenvalues of the queueing 
system, and zq is the largest negative eigenvalue. In general, D is a value 
different and sometimes significantly smaller than 1. Because the effective 
bandwidth method is based on the approximation that the prefactor D = 
An{NpR/C)^ equals 1 and H{x) = exp(zoa;), it will become more and more 
inaccurate as the source utilization decreases. It is clear that G increases 
when sources become more bursty. 

3.1.2 Effect of Number of Sources 

In Figure 3, Gi, G 2 , and G are plotted as a function of number of sources, N . 
As e in (3) is independent of TV, G\ remains a constant as TV varies. It can 
be seen from (6) that as TV increases, D is expected to drop rapidly below 1. 
Therefore the effective bandwidth approach becomes more conservative and 
the gain accrued by combining and smoothing the source cell arrivals at the 
buffer, G 2 , increases. A higher gain G is possible if the number of sources 
increases. 
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Figure 3: Effect of number of sources on multiplexing gains, p = 0.1, x = 
7.5, e = 10“^. dashed line: Gi, dotted line: G 2 , solid line: G 



3.1.3 Effect of Buffer to Burst Size Ratio 

In Figure 4, ^ 1 ,^ 2 , and G are plotted as a function of buffer size to burst 
length, X. As intuitively expected, e in (3) can be easily found to be the mean 
bit rate pR and the peak bit rate R when x 00 and x — > 0, respectively. 
Therefore G\ is a monotone increasing function of x, ranging from 1/p to 1. 
On the other hand, since H{x) — )> exp( 2 :o.'r) when x becomes so large that 
2 : 0 a: <^ — 1 and H{x) «C 1, the effective bandwidth is more and more accurate 
as X increases. So G 2 is a monotone decreasing function of x. In general, 
when we use larger buffers, the total gain achieved, G, is higher. 

3.2 The Gaussian Approximation Approach 

When the effect of statistical multiplexing is of significance, the distribution 
of the stationary aggregate traffic rate can be rather accurately approximated 
by a Gaussian distribution [7]. Let M = Nm = NRp and a = R^Np{l — p) 
be the mean and standard deviation of the aggregate traffic of N sources 
respectively. Then the bandwidth required to meet the desired QOS 6 is 
given by 

g = M + ka (7) 
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Figure 4: Effect of buffer to burst size ratio on multiplexing gains, p — 
0.1, N = 50,6 = 10~^. dashed line: Gi, dotted line: G2, solid line: G 



where k = >/— 21n(e) — ln(27r). It is easy to check that when 1 > p > po = 
’ 9 ^ which is meaningless. So we modify (7) to the following 
equation: 

g = min(M + ka, NR) (8) 



Again, G can be resolved into two parts: 



G = 



NR 

9 



■§=G.G. 



(9) 



where G3 = NR/g is the multiplexing gain across sources, and G4 = gjC 
is the statistical gain achieved by means of buffering. The problem with the 
Gaussian approach is that it ignores the multiplexing buffer completely, and 
relies on conservative bounds on the cell loss probability. That is why G4 in 
(9) is expected to be greater than 1. 



3.2.1 Effect of Source Utilization 

The first set of examples illustrate the effect of the source utilization on the 
multiplexing gains. In Figure 5 and Figure 6 we plot G3, G4, and G as a 
function of p. The other parameters are as shown in the figures. Interestingly, 
we can see quite different behaviour of G3 and G4 against p in these two 
figures. As p — >■ 1, the limit of g in (7) is NR and G3 tends to 1. As p 0, 
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Figure 5: Effect of source utilization on multiplexing gains. N = 60, x = 
5.0, e = 10“®. dashed line: G 3 , dotted line: G 4 , solid line: G 



the limit of g in (7) is 0 and G 3 tends to a very big number. Note that 
when N 00, po -> 1, so in this case G 3 is decreasing with p and unlikely 
to level off at 1. This reflects the fact that the Gaussian approximation is 
unreliable when the number of sources is not large enough, as is the case 
in Figure 6 (A^ = 5). In other words, the Gaussian approximation can only 
work well when sufficiently large number of sources are multiplexed together. 
The relationship between G 4 and p seems to be unclear and needs further 
investigation. 



3.2.2 Effect of Number of Sources 

In Figure 7, ^ 3 ,^ 4 , and G are plotted as a function of number of sources, 
N. Obviously, G 3 is a monotone increasing function of N, whereas G 4 is 
a monotone decreasing function of N. As iV 00,^3 1/p. This is 

consistent with the central limit theorem, i.e., as more and more i.i.d. sources 
are multiplexed together, the aggregate traffic is more Gaussian. 



3.2.3 Effect of Buffer to Burst Size Ratio 

In Figure 8 , G 3 ,G 4 , and G are plotted as a function of buffer size to burst 
length, X. As p in (7) is independent of x, G 3 remains a constant as x varies. 
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Figure 6: Effect of source utilization on multiplexing gains. N = = 

5.0, e = 10~^. dashed line: Gs, dotted line: G4, solid line: G 
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Figure 7: Effect of number of sources on multiplexing gains, p = 0.1, x = 
7.5, e = 10“^. dashed line: G3, dotted line: G4, solid line: G 
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Figure 8: Effect of buffer to burst size ratio on multiplexing gains, p = 
0.1, N = 50, e = 10“^. dashed line: Gs, dotted line: solid line: G 



On the other hand, since the Gaussian approximation fails to take into account 
the effect of access buffer, there is considerable gain of G4 and this gain is 
increasing with buffer size. 



4 Multiplexing Gains for Traffic with Long- 
Range Dependence 

In this section we summarize some results related to traffic with long-range 
dependence, which have been published in a previous paper [6]. Readers can 
refer to that paper for more details. 

Recent studies of real traffic data, mainly at Bellcore [11], have shown that 
Ethernet traffic cannot be sufficiently represented by traditional Markovian 
models, but instead can be more accurately matched by self-similar (fractal) 
models. More recently, variable-bit-rate(VBR) video traffic was also found 
to exhibit self-similar characteristics [3]. An important feature of self-similar 
processes is their long-range dependence (LRD), that is, their autocorrelation 
function decays less than exponentially fast. This property of persistent cor- 
relation can be characterized by the Hurst parameter H, with H — 0.5 for 
Markovian streams and H > 0.5 for streams with LRD. Studies by Norros 
[12], Erramilli et al. [5] suggest that LRD arrival processes could produce 
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much higher queue lengths and delays than Markovian sources, and that traf- 
fic engineering formulas based on Markovian models could, in some cases, 
result in under-engineering. 

On the other hand, in their studies of the bandwidth needed on an ATM link 
carrying VBR video teleconference traffic. Hey man et al. [8] and Elwalid et 
al. [4] have shown that, even though the Hurst parameter of the VBR stream 
has been determined to be about 0.7 [3], effective bandwidth formulas derived 
for Markovian models are, in fact, successful in determining the bandwidth 
required to support LRD VBR streams. 

Thus, there exist two somewhat opposing reports on the effect of LRD on 
traffic engineering, both from Bellcore. Krishnan [9] explains this apparent 
contradiction with the help of a ‘crossover’ effect of the Hurst parameter, i.e., 
when sufficiently large number of independent and identical sources are mul- 
tiplexed, one can obtain a larger multiplexing gain with high-if sources than 
with low-H sources. In another work [10], Krishnan and Meempat demon- 
strate the crossover effect both for infinite and finite buffer queues with data 
traces of video teleconference. 

Considering an infinite buffer queue with fractional Brownian motion (FBM) 
input, Krishnan [9] utilizes the stationarity and scaling property of the buffer- 
level process to derive the crossover result. Although the derivation is straight- 
forward, his main results are qualitative and the relationship between crossover 
buffer size, number of sources and the Hurst parameter is not clear. At least 
two questions remain unanswered: (i) For a specific LRD arrival process with 
H > 0.5, under what buffer sizes (time scale) can an appropriate Markovian 
model (i7 = 0.5) provide good{conservative) prediction of the cell loss rate? 
(ii) How many identical LRD sources need to be multiplexed together to 
achieve a higher multiplexing gain than the Markovian model? Here we just 
consider the second issue, while both issues are investigated in [6]. Our anal- 
ysis is based on the large deviations estimates of the overflow probabilities 
for the FBM traffic model. Explicit formulas have been derived to give some 
quantitative insights into the impact of the Hurst parameter and its crossover 
effect on traffic engineering. 

4.1 The Fractional Brownian Motion Model for Traffic 
with LRD 

The fractional Brownian motion model has been used in [12] to successfully 
characterize the self-similar LAN traffic. Consider an FBM process Z{t) 
with Hurst parameter H G [1/2,1). It is a zero mean non-stationary Gaus- 
sian process with stationary increments and covariance structure Cov(t, s) = 
“ 1^ “ special case H = 112, Z{t) is the stan- 
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dard Brownian motion. The self-similar property of Z{t) is based on the 
fact that Z{at) is identical in distribution to Z{t). The increment process 
X{k) = Z(A; -h 1) — A: > 0 is called fractional Gaussian noise(FGN) and 
is a stationary (discrete-time) Gaussian process with autocorrelation function 
r(k) = 1/2(|A: -h - 2\k\^^ \k - > 1. It is easy to see that, 

asymptotically, r{k) ~ H{2H — l)|A;p^“^, i.e., X exhibits LRD [11]. 

An ATM multiplexer can be modelled as a single-server queue with constant 
service rate C and buffer size B. Assume there are a large number N of 
homogeneous self-similar input traffic streams. Let A{0,t] denote the distri- 
bution for the cumulative arrivals (cells or work) from each stream over the 
time interval (0,tj. A(0,^] can be constructed as follows [12]: 

A(0, t] = mt-h y/r^Z{t), (10) 

where m > 0 is the mean input rate, the scale factor a > 0 gives the vari- 
ance/mean ratio for arrivals over one unit of the chosen time scale, and Z{t) 
is the above described FBM with Hurst parameter H. Also let b and c denote 
respectively the amounts of buffer space and bandwidth per source, so that 
B = iV6 and C = Nc. 

It has been shown in [6] that the buffer overflow probability for the above 
model can be given by 

G(B. H) = Pr«J > B) » (II) 

where Ac(if) = . 

4.2 The Hurst Parameter’s Crossover Effect on Perfor- 
mance 

The Hurst parameter of self-similar traffic has been widely regarded as a 
measure of hurstiness, i.e., the higher the Hurst parameter, the burstier the 
traffic [11]. Starting from this point, one tends to conclude that traffic with 
LKD{H > 0.5) may result in much more severe performance degradation than 
traffic with Markovian structures(iJ = 0.5) [5]. Also, it has been claimed in 
the literature that the buffer behavior of LRD traffic cannot be accurately 
predicted by simple, parsimonious Markov-based models. However, it can be 
shown that a curious crossover property with respect to H for FBM traffic 
models exists and suggests that a high value of H does not necessarily imply 
that Markovian models will lead to under-engineering of bandwidth on ATM 
links. 

Consider now the gain achieved by statistically multiplexing a large number 
of independent and identical sources with LRD. We express this multiplexing 
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Figure 9: Statistical multiplexing gain. Solid line: H = 0.7, dashed line: 
H = 0.8, dotted line: H = 0.9 



gain in terms of the required bandwidth per stream, cq. Denote e the target 
overflow probability Pr(Q > B). Then from (11), we have 

co{N,H) = m . (12) 

Note that when H = 0.5, co(N^H) is independent of TV, which is consistent 
with the concept of effective bandwidth. 

Figure 9 shows co{N,H) vs. N for different values of H. In this experiment, 
e = 10“^, = 10, m = 10, a = 1. The graph shows that the bandwidth re- 

quired per source decreases with increasing number of sources. The crossover 
effect is obvious: when the number of multiplexed sources is large, the multi- 
plexing gain with high-H sources is larger than that with low-i7 sources, and 
the converse is true for smaller number of sources. 

Taking a closer look at Figure 9, we And that for different if, the numbers 
of sources at which the crossover happens are different. Although according 
to (12) there is no multiplexing gain across sources for Hq — 0.5, we still 
take Ho as a reference and this doesn’t affect our main results much. We 
want to investigate how many independent, identical streams with LRD(i7 > 
0.5) need to be multiplexed together to achieve smaller cq than streams with 
Ho = 0.5. Let’s denote this crossover number of sources by Ncr- By using the 
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Figure 10: g{H) vs. H 



formula (12) and letting cq{N^Hq) = co{N,H), we have 

2H 

Ncr = (-21og(e)am)~^B^ = {-2\og{e)am)~^B'^g{H), (13) 

[k{H)]«o-« 

where g{H) is a function of H and defined by 



g{H) = 



2Ho 



(14) 



As shown in Figure 10, g{H) is a monotone non-decreasing function of H. 
That means, for LRD traffic with a higher value of iJ, more identical sources 
should be multiplexed together to achieve possible higher gains. In other 
words, when we try to use Markov models (iJo = 0.5) to provide a conservative 
estimate for the bandwidth needed for an LRD traffic with known H > 0.5, the 
higher the Hurst parameter iif , the larger the number of multiplexed sources 
for which this estimate works well. This is confirmed by the results of VBR 
video traffic in [4]. From the traffic control point of view, we may state that 
for long-range dependent traffic, increasing the buffer size has little impact on 
reducing cell loss rate, since an input process with LRD generates occasional 
bursts of traffic that cannot be absorbed even by very large buffers. On the 
other hand, statistical multiplexing several streams is a very efficient way to 
reduce loss while keeping utilization high. 
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Figure 11: Utilization vs. number of traffic streams, H = 0.6. Solid line: 
B = 100, dotted line: B = 1000 



In Figure 11 and Figure 12, the utilization level of the link as a function of 
number of multiplexed streams has been depicted. The traffic parameters 
are chosen asm = l,a = l,e = 10“^. It can be seen that there will be 
quite significant multiplexing gains if the traffic grows by aggregating more 
and more of the same type of traffic streams. These gains are more apparent 
for traffic with a larger value of if. As a consequence, there is no reason 
to believe that the self-similar property of traffic will make it difficult for 
networks to achieve high levels of utilization. Note that in [1], Addie has got 
similar conclusions. 



5 Conclusion 

In this paper, we show that the total multiplexing gain due to the independent 
combination of identical sources can be resolved into two factors, one express- 
ing the advantage gained (by means of buffering) from the statistical rate 
variations within a source and the other expressing the efficiency of statistical 
multiplexing of i.i.d. streams (gain across sources). The main findings of the 
paper are summarized in Table 1. The above results indicate that although 
bursty sources require more bandwidth, multiplexing gains are increasing with 
burstiness(the reciprocal of utilization). As the effective bandwidth approach 
is based on large buffer asymptotic and ignores the statistical multiplexing 



392 





1l 




0.4 

10 ‘ 



10 10 10 
number of traffic streams 



Figure 12: Utilization vs. number of traffic streams, H = 0.8. Solid line: 
B = 100, dotted line: B = 1000 
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Table 1: Gains as functions of traffic parameters. I: increasing function; D: 
decreasing function; C: constant; N: non-monotonic 
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gain obtained by multiplexing different sources onto a single link, it can only 
work well in the region with high buffer size to burst length ratio, high source 
utilization and small number of sources. In the mean time, it is found that the 
Gaussian approximation performs well in the region where little gain is ob- 
tained by having buffers in the system and there is a large number of sources. 

Based on the above analysis, resource allocation and admission control schemes 
should take both sub-gains and the trade-off between them into consideration 
so as to achieve a higher total multiplexing gain. It has been shown in [14] 
that a single bandwidth allocation algorithm does not cover the whole region 
of traffic situations. Therefore a possible solution is the effective combina- 
tion of several methods which have strengths and limitations within different 
regions in the traffic space and complement each other. This is a promising 
direction of further study. 

We have extended our discussion to traffic that exhibits LRD. Thanks to large 
deviations theory, we further Krishnan’s work by investigating the crossover 
property in more detail for the FBM model. The relationship between crossover 
number of sources and the Hurst parameter is discussed. These results lead 
us to believe that even for LRD traffic with high- if values, it is possible to 
obtain higher multiplexing gains when a large number of independent sources 
with the same Hurst parameter are multiplexed together. 
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Abstract 

ABR was standardised by the ATM Forum in 1996 . Source, destination 
and switch behaviours were specified. However, a lot of freedom was left to the 
switch manufacturers to implement an efficient algorithm compliant with the ABR 
specifications. There exists three different behaviours. Namely, they are binary 
switches. Relative Rate (RR) switches and Explicit Rate (ER) switches. 

In this paper, a new ER algorithm named ERAQLES is described. It is 
original because it uses the buffering capacities of the switches as well as a novel 




control function to derive an optimum explicit rate for the connections. The basic 
mechanisms of ERAQLES are presented and its performance is compared to 
ERICA and MACR solutions. 

The ability of these algorithms to achieve fairness in some interesting 
situations is investigated. We first consider the case of networking environments 
where both RR and ER switches are involved. Second, we explore the efficiency of 
these solutions when facing VBR traffic competing with ABR. Results shows that 
due to its design properties, ERAQLES outperforms ERICA in most situations. 



Keywords 

ABR, traffic management, explicit rate marking, relative rate marking. 
Performance analysis 



1 INTRODUCTION 

Although it is a controversial solution, ABR can be considered as an interesting 
service to be provided by carriers to their users. Therefore, it is important to 
demonstrate that efficient and stable solutions can be designed. The ATM forum 
ABR specifications (ATM Forum, 1996) include the description of the source, the 
destination, and the switch behaviour. However, different solutions are compliant 
with these specifications. A switch can implement three different mechanisms : 

• the simpler ABR implementation is called the binary switch behaviour: it sets 
the EFCI (Explicit Forward Congestion Indication) bit of the data cells when a 
congestion level is reached in the ABR output queue(s). The ABR destination 
will then return this congestion indication to the source using the Resource 
Management (RM) cells. There is no possibility to differentiate low from 
heavy congestion. Moreover, the indication received by the source is delayed 
due to the time it takes to carry this signal from the congested node to the 
destination and back to the source. 

• A second solution is the Relative Rate switch behaviour: it sets the Cl 
(Congestion Indication) and NI (No Increase) bits in the RM (Resource 
Management) cells according to a switch congestion threshold. The switch can 
directly update the RM cells and therefore decrease the delay between the time 
the congestion is detected to the time the source is informed. 

• Unlike the other solutions where a blind computation of the source rate is 
done, the ER switch behaviour aims at providing each connection with its 
explicit rate. This requires a specific hardware, but is recognized to provide 
the best performance in terms of cell loss, fairness, and throughput. 

The performances of the different solutions are usually measured in terms 
of fairness among the ABR connections and throughput achieved on the 
transmission links. 

The design of efficient ABR ER algorithms has been extensively studied 
the last few years. One of the first issue was to decide whether to use credit (Kung 
1994) or rate based mechanisms. Although, the former solution exhibits some 
advantages, the rate-based solution was chosen (Bennet, 1994, Van Boven 1995). 
Thereafter, ER marking was recognized as being more efficient than RR marking. 
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although it increases the switch complexity that is a major issue in ABR design. 
EPRCA (Enhanced Proportional Rate Control Algorithm) (Roberts, 1994) was 
proposed and highly cited in many papers that were either addressing performance 
studies (Fang, 1994, Ritter, 1996, Ohsaki, 1995, Ohsaki, 1996, Barnhart, 1994) or 
suggesting improvements (Siu, 1994, Mascolo, 96). The ER is computed according 
to a congestion threshold. These algorithms are pretty simple but are not always 
fair and often unable to control situations where sources are not greedy. A second 
set of algorithms was initiated at Ohio State University: OSU or ERICA (Jain, 
1996, Jain, 1997) as well as at MIT (Charny, 1995, Charny, 1996). In both cases, a 
switch needs to estimate the source rate in order to compute the ER that is mainly a 
function of the links availability. Efficiency is improved at the expense of an 
increased complexity (if n is the number of connections flowing through a switch, 
some parameters are computed in 0{n)). With these algorithms, the rate allocated 
to the source is often less than the available rate and fairness is not completely 
achieved. Most of the subsequent algorithms are a mixed of the two above 
mentioned ones. 

A third generation of algorithms appeared recently with the objective to 
fix the above mentioned problems. They use the ABR buffer capacity to compute 
the ER. The first solution presented in this paper is ERAQLES. It is shown that 
ERAQLES outperforms the other solutions. ERAQLES fairness and convergence 
have been demonstrated mathematically (Moret, 1997) and evaluated through 
simulation in various configurations (Moret, 1997). The second solution akin to 
ERAQLES is ERICA+ (Jain, 1997). It is an extension of ERICA that has not been 
completely specified and formally proved. 

ERAQLES and ERICA behaviours are presented in section 2 and will be 
compared in the remainder of the paper. Section 3 considered the situation when 
switches provided by different vendors are in the same network. These switches 
might implement RR or ER algorithms and it is therefore of utmost importance to 
verify their robustness when both solutions have to interact. Section 4 investigates 
the case where VBR sources are competing with ABR traffic. Section 5 concludes 
the paper. 




Figure 1. RR algorithm implementation in switches. 
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2 



ERAQLES AND ERICA 



2.1 RR switches 

A Relative Rate algorithm like [2] is described in figure 1 . We define the following 
notations : 

• ^ is the number of cells queued in the ABR queue ; 

• b is the ABR queue maximum size ; 

• Qh a. high-level threshold. If e is larger than Qh, the RR algorithm assumes 
that the switch is heavily congested ; 

• Ql a low-level threshold. If e is larger than Ql, the RR algorithm assumes that 
the switch is lightly congested ; 

• and the fields of the RM cells. 

2.2 ERICA switches 

ERICA has been chosen as a comparison to ERAQLES because it is a well-known 
solution that was extensively studied in the literature, well understood and found to 
be efficient. After several experiments, the designers of ERICA have mentioned 
that their solution was not always fair. Therefore, the specification was modified in 
order to propose a better solution (Jain, 1997). This latter version is not considered 
in this paper because it still shows many important problems. Indeed, ERICA may 
become totally unstable when some connections have a non zero MCR (Minimum 
Cell Rate), and is not able to provide a fast computation of the total available rate 
nor the explicit rate for each connection. Figure 2 presents the switch behaviour of 
ERICA with the following definitions : 

• AI is the timer used to compute all variables of the switch (except SBRMi and 
ERCi) ; 

• TU is the Target Utilisation of the link for ERICA. TU is set to a value lower 
than 100% in order to reduce the ABR queue size and therefore, the end-to- 
end delays ; 

• Cabr is the total available bandwidth for the ABR service ; 

• TCR is the Target Cell Rate according to the TU and Cabr values ; 

• CLL is the Cell Load Level of the switch ; 

• RCC is the Received Cell Counter. It allows to compute CLL ; 

• denotes if connection i is considered active. If during the current timer 
period, a data cell is received from connection /, ERICA assumes that the 
connection i is active for the current timer period. This variable is reset for all 
connections when the timer expires; 

• FS is the computation of the Fair Share rate. It is equal to TCR over the 
number of active connections ; 

• ERC. is the ER computation for connection i during the current timer period. If 
CLL is larger than one, it means that the switch can accommodate a higher 
traffic and the connection is allowed to increase its Current Cell Rate 
(according to the CCR^,,^ field of the RM cell). If not, the ER computation 
results in a rate decrease. In all situations, ER must not exceed FS ; 
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• SBRM. means that a Backward RM cell was received on connection i before 
AI has timed-out. It allows to compute a unique ER value for connection i 
during AI. This parameter is reset when the timer expires ; 

For more details, a complete description of ERICA is provided in Jain (1996, 



1997). 




Figure 2. ERICA algorithm implementation in switches. 



2.3 ERAQLES switches 

ERAQLES behaviour is described in Figure 3 were: 

• e, is the number of cells queued in the ABR queue, 

• b, is the maximum ABR queue size, 

• r, is the target number of cells in the ABR queue, 

• r\ is an average of e over a period equal to NNrm RM cells, 

• is the maximum delay between sources and a switch, adjusted at every 
connection set-up, 

• n, is the number of ABR connections alive at time t. 
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Figure 3 . ERAQLES algorithm implementation in switches. 

The ER computed for connection i is 7 . = 7 . where is the fairness 
function and 7 the total available bandwidth for ABR services. A simple choice for 
5. is s- = /n, where n is the number of active connections going through the 
switch. While this computation is simple, it does not take into account lightly 
loaded (referred as « lazy ») sources that do not use their maximum allowed 
bandwidth. In order to use the bandwidth left by « lazy » sources, a new function is 
introduced: 

=(l-^o)-P, +^o 

where Sq = /n and p. is the ratio between e. (the number of cells for connection i 
queued in the ABR queue) and e (the total number of cells queued). is the 
minimum bandwidth that can be allocated. Some properties of ERAQLES are 
summarized below. According to the function 5-, if i and j are not two “lazy” 
connections, we obtain 

l5,.-^.|=ll-5ol.lp,-p^l 

s. (respectively p.) is the value of the new (respectively current) bandwidth 
allocation ratio for connection i. In addition, we have II — .?qI< 1 . Therefore, the 
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difference between the bandwidth allocated to i and j will decrease, and s. will 
converge to On the other hand, if i is a “lazy” connection, we have p. < pj that 
implies s. < Sj. Connection j will then get a larger share of the bandwidth than 
connection i and be allowed to use the bandwidth ratio saved by the “lazy” 
connection. The total ABR bandwidth available in the switch is computed as 
follows: 



Y = C + h- 



r-e 



where is the control period (maximum delay between sources and the switch), 
and C the estimation of the bandwidth left unused by reserved (CBR and VBR) 
traffics. C is re-evaluated every NNrm RM cells. In fact, can be 

interpreted as the maximum amount of bandwidth that the switch can distribute at 
a given instant, while still being able to control the flow of cells in the following 
control period. It was shown (Roche, 1995) that this control function pulls the 
ABR queue to converge to an utilization (filling ratio) of r. h is an important 
parameter of the resource evaluation function. The larger it is, the slower the 
convergence. The optimum value for h (0,1839) was derived mathematically 
(Moret, 1997). 

In order to compute the ER for connection /, ERAQLES needs to know 
the number of cells for this particular connection stored in the ABR queue. For that 
purpose, ERAQLES uses a table where this information is stored for each ABR 
connection. Although this table is of size n, whatever the event being processed, 
the reading or the modification of this table is always limited to a single element. 
Then, the complexity is in 0(1) and does not depend on the number of connections 
going through the switch. If we now consider ERICA as a comparison, we found 
that it also needs a table of size n that accounts for the number of active 
connections. However, unlike ERAQLES, the complexity to manage this table can 
be in 0{n). This is because the entire table must be reset every AI units of time (the 
designer of ERICA suggest that AI must be lower than 1 ms). We can conclude 
that the complexity of ERAQLES is lower than the one of ERICA. 



3 ER ALGORITHMS IN DIFFERENT ENVIRONMENTS 

The ability of these algorithms to achieve fairness in some interesting 
situations is investigated. We first consider the case of networking environments 
where both RR and ER switches are involved. The homogeneous case assumes that 
all switches are compliant to an ER algorithm (ERAQLES or ERICA). In the 
heterogeneous case, the switches that experience the bottleneck conforms to RR 
switches (section 2.1). This situation was chosen because a Relative Rate switch is 
the bottleneck one and it appears that ERICA does not behave as expected when 
RR switches are experiencing/controlling the bottleneck. To our knowledge, this 
problem was first discovered by Plotkin and Sydir (Plotkin, 1997). The 
configuration shown in Figure 4 is exactly the same as the one studied by Plotkin 
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and Sydir. It was important to take exactly the same situation to demonstrate that 
ERAQLES was able to perform properly, unlike ERICA. 



I H G F E DC B 




GH I 

Figure 4. Network topology with multiple bottlenecks. 

The network includes 6 switching nodes (SI to S6) and 9 ABR greedy 
sources, from A to I. The maximum throughput of the links is = 385 000 cells 
per second (=155 Mbps). The propagation delay between each consecutive node is 
250 ms (= 50 km). Therefore, the first bottleneck is located between nodes S3 and 
S4, and affects connections A to D and G to I. The second bottleneck is between 
S5 and S6 and influence connections A to F. Roughly, each of the connections A 
to D and G to I will be able to transmit 54 000 cells/s (C,^^, /7), while E and F will 
send 80 000 cells/s (3/2* /7). The other parameter values are the ABR queue 

size b = 2000 cells, the decrease factor RDF = 1/16 and the increase factor RIF = 
1/256. 

The default parameter values for the ERAQLES switches are NNrm = 512 
RM cells and r = 500 cells. For ERICA, the default parameter values are TU = 
90% and A/ = 1 ms that corresponds to the values recommended by their 
designers. Finally, for the RR switches, we have Ql = 600 cells and Qh = 1200 
cells. 




time (s) 

Figure 5: throughput per connection in an homogeneous configuration 
(ERAQLES) 
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3.1 ERAQLES performance in an homogeneous environment 

The simulation results for ERAQLES are shown in Figure 5. They show 
that after a certain convergence delay, the transmission rates of the different 
connections converge to the expected values (54 000 cells/s and 80 000 cells/s). 
Other algorithms like ERICA would achieve about the same results. As a 
conclusion, good results are obtained when all switches are implementing an ER 
algorithm. 

3.2 ERAQLES and ERICA in an heterogeneous environment 

The congested nodes, S3 and S4, implement the RR behaviour described in section 
2.1. All the other nodes implement ERAQLES or ERICA ER algorithms. The 
simulation results are shown in Figure 6 for ERAQLES and Figure 7 for ERICA. 

For ERAQLES, while the convergence delay is larger than in the 
homogeneous case, the allocated bandwidth converges to 54 000 cells/s and 
80 000 cells/s depending on the sources. The fairness property, already 
demonstrated for ERAQLES in an homogeneous environment is preserved in a 
mixed environment where RR and ER switches are interconnected. In this case, 
ERAQLES is said to be compatible with RR switching nodes. This is an important 
result because it was shown that ERICA is not able to provide a good bandwidth 
sharing in the same situation. Therefore, ERAQLES is more robust than ERICA. 



120 000 - 




time (s) 

Figure 6: throughput per connection in an heterogeneous configuration 
(ERAQLES) 
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Figure 7 : throughput per connection in an heterogeneous configuration (ERICA) 




4 ERAQLES, ERICA AND VBR SOURCES 

The objective of this section is to explore the behaviour of both solutions when the 
network is fed by VBR sources that compete with ABR. The simple example 
considered in this section is sufficient to emphasize the interest to use the ABR 
queue size capacity (as in ERAQLES) to increase the statistical gain and achieve 
the target throughput. All the tests are based on the network configuration shown 
in Figure 8. 

B C 




B C 



Figure 8. A simple onfiguration with 2 VBR sources. 

The distance between 2 consecutive switching nodes (S1,S2 and S3) is 1 
kilometer. A is an ABR connection while B and C are VBR connections. B is 
attached to SI and terminated in S2. Its traffic will modify the computation of the 
rate for the ABR connection on the link S1-S2. Similarly, C will influence the 
computation of the rate on link S2-S3. The VBR sources are modelled by the same 
ON-OFF processes. Under these conditions, the links between the switches can be 
congested independently. For an algorithm that does not use the ABR buffer size 
available in the switch, the control algorithm will consider that the network is 
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congested when a single VBR source is active. Otherwise, it will consider that the 
network is congested when the two VBR sources are active. Thus, the second 
algorithm must be more efficient that the first one. 

During the activity period, the rate of the VBR sources are 200 000 cells/s 
and the mean activity/inactivity period length is S. The other parameters are the 
maximal link rate = 385 000 cells/s, b = 10 000 cells, r = 5000 cells, RIF = 
PCR = (RIF =PCR so that the ABR source will always transmit at the ER value 
carried in the RM cell). 

4.1 Simulations results with a S constant 



In this section, we assume that the activity/inactivity period S is constant. 
Two synchronization scenarios between the VBR sources are considered as shown 
in Figure 9. 




Figure 9. Synchronization schemes for the 2 VBR sources. 

When the VBR sources are fully synchronized, the available bandwidth 
on S1-S2 and S2-S3 links when B and C are active is reduced by 200 000 cells/s. 
Then, the ABR connection (A) should be able to transmit 185 000 cells/s 
(including RM cells). The simulations done with any value for S show that with 
ERICA, A is able to send 260 000 cells/s and with ERAQLES 284 000 cells/s. In 
fact ERICA throughput is bounded by a target utilization set to 90% (260 000 = 
285 000 * 0.9). In this case, similar results are obtained with ERICA and 
ERAQLES. 

We now assume that the VBR sources are in opposition: they are never 
active simultaneously. The throughput achieved as a function of S is presented in 
Figure 10. It shows that the throughput allowed by ERAQLES is larger than with 
ERICA: 

• The throughput derived from ERICA is constant and equal to 167 000 cells/s; 
In such a situation, ERICA measures a constant VBR load of 200 000 cells/s, 
and then allows the ABR source to transmit at 167 000 cells/s ((385 000 - 
200 000)*0.9). 

• The throughput derived from ERAQLES varies from 285 000 cells/s (small S) 
to 185 000 cell/s (large S). In fact, SI absorbs the traffic issued by A, which 
implies that the ABR queue in S2 is often lower than r. When B is OFF, the 
queue in SI decreases ; its occupancy will also be lower than r and 
ERAQLES will compute an ER larger than 185 000 cells/s. Moreover, when B 
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received throughput (cell/s) g received throughput (cell/s) 



gets ON, there is a certain delay necessary to fill the ABR queue in SI up to r; 
during this delay, the rate is larger than 185 000 cells/s. Then when 5 is large, 
the transition periods are not frequent and ERAQLES has the same behaviour 
than ERICA. On the other hand, when the transitions are frequent, ERAQLES 
behaviour is improving, reaching the same throughput than when sources are 
synchronized. 



270 000 
240 000 
210 000 



180 000 
150 000 

0 0,05 0,1 0,15 0,2 

5 (s) 

10. simulation results with 2 VBR sources in opposition - constant period. 
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Figure 11. Simulation results with exponential ON/OFF period duration. 
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4.2 Simulations results with an exponential 5 

Additional tests where carried out when the active period length follows 
an exponential distribution. Results are presented in Figure 11. They show that 
ERICA behaviour is improved compared to the previous situation; the maximum 
throughput value is 222 000 cells/s instead of 185 000 cells/s, which implies a total 
of 322 000 cells/s on the links (83.6%) for a target load of 90%. However, the 
difference is still in favour of ERAQLES that can allow up to 268 000 cells/s, that 
corresponds to a total throughput of 95.6% on the links. 



5 CONCLUSION 

ERAQLES is a novel Explicit Rate ABR algorithm. It is original because 
it uses the buffers available for the ABR traffic in the switches. This feature allows 
to smooth the rate variations due to the VBR traffic and therefore limit the 
throughput degradation during the congestion periods. Moreover, it uses a novel 
control function that makes the ABR queue converges to a target value r. It was 
important to show that ERAQLES is robust in various environments where ERICA 
was found to have problems. We have chosen ERICA as a comparison « metric » 
because it is a well-know solution that was extensively studied in the literature and 
found to be efficient. 

The simulation results presented in this paper have shown that our 
solution provides a fair and efficient bandwidth allocation among ABR 
connections, whatever the environments being considered. The allowed rate and 
link throughput achieved by ERAQLES are about 25% larger than with ERICA. 
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Abstract 

Using real traffic data, we show that neural network-based prediction techniques 
can be used to predict the queuing behaviour of highly bursty traffics typical 
of LAN interconnection in a way accurate enough so as to allow dynamical 
renegotiation of a DBR traffic contract at the edge of an ATM network. 

The performances of predictor-based in service renegotiation are evaluated 
in terms of renegotiation errors and reserved bandwidth for the the DBR traffic 
handling capability and are shown to be very encouraging for the use of con- 
nectionist prediction techniques for the management of bursty traffics in ATM 
networks. 

Keywords; neural networks, traffic prediction, leaky bucket, LAN inter- 
connection, ATM networks. 

1 Introduction 

In order to realize its promises as the B-ISDN transfer mode, ATM has to fulfill 
two conflicting requirements, namely ‘Bandwidth on Demand” and “Guaran- 
teed Quality of Service (QoS)”, for various types of traffic. This is particularly 
challenging in the case of variable bit rate [VBR) traffics, such as compressed 
video or LAN interconnection, where the behaviour of the sources is not well- 
defined in terms of bandwidth requirements. 

*now at INKS- telecommunications, 16 place du Commerce, Ile-des-Soeurs (QC) 
CANADA, H3E 1H6. 

^whom correspondance should be addressed 




In order to fulfill the ^'Guaranteed QoS” requirement, traffics should not 
be allowed to access the network without control, and such a control (traffic 
policing) is specified in terms of continuous state leaky buckets (also known 
as generic cell rate algorithm or virtual scheduling algorithm) at the network 
edges [1, 11]. This implementation supposes a traffic contract between the 
source and the network which defines the behaviour of the source in terms of 
mean cell inter-arrival time and cell delay variation tolerance. The enforce- 
ment of this traffic contract at the User-Network Interface (UNI) protects the 
network against bursts of uncontrolled length and intensity and such a traffic 
characterization allows to reserve necessary resources inside the network so as 
to guarantee the required QoS. Various schemes can be used to reserve those 
necessary resources and one of them, namely the Deterministic Bit Rate (DBR) 
traffic handling capability, will be studied below. In the following, we shall be 
concerned with a restrictive definition of the quality of service in terms of cell 
loss mainly as we only address the problem of data traffic. 

The "Bandwidth on Demand” requirement can then be implemented by 
renegotiating (periodically or upon request from the source) the traffic con- 
tract and using, for instance, a Fast Reservation Protocol (FB,P) [2]. However 
fast they can be, resource reservation protocols cannot be based on the instan- 
taneous characteristics of the traffic to be carried: reservation of the resources 
involves a latency of the order of the network round trip time at least and, 
moreover, the operation of these protocols should not overload the network in 
terms of processing time. This points out the need for the source to be able to 
efficiently predict its traffic descriptor over a typical inter-negotiation period. 

Although this access control scheme based on both resource reservation 
and enforcement of the declared traffic descriptors allows an efficient use of 
the network resources, it may be quite difficult to implement from a source 
point of view, specially in the case of very bursty traffics as is the case for LAN 
interconnection: such bursty sources cannot efficiently negotiate their traffic 
contract for the next period without being able to accurately predict their own 
behaviour during this period. Such a prediction capability is indeed an essential 
requirement to the realization of ATM promises. 

Although predicting traffic with neural networks has been advocated for 
compressed video [5] , we are not aware of such a study for data traffics or for 
the time-scales considered below. In this contribution, we shall show, using 
real bursty traffic data^ that such a prediction of the queuing behaviour of such 
traffics is indeed possible with neural networks. 

This may seem in disagreement with the conclusions of recent studies of 
LAN and WAN traffic which have evidenced the wide intensity variations and 
long term correlations existing in such traffics [14, 16]. It should be recalled 
that we are not in any way trying to predict the behaviour of the traffic itself, 
but we rather try to predict the extreme behaviour of a queue driven by the 
traffic so as to define an appropriate traffic descriptor in the ATM framework 
for the next period. In this respect, while leaving the question of modeling 
data traffic open, this study aims at giving a pragmatic answer to the problem 
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of “fitting” such traffic into the rather rigid requirements of traffic policing at 
the edge of an ATM network. 

The framework of this study is summed up in Figure 1: a pair of LANs is 
interconnected through a pair of VCs inside an ATM WAN; note that individual 
sources belonging to a LAN are multiplexed at the VC level. The prediction 
function is implemented on this multiplexed trafic, at the ingress of the ATM- 
WAN only (on the LAN side of the UNI) and is used to periodically renegotiate 
the usage parameters of the outgoing VC with the CAC (Call Acceptance 
Control) . The conformance of the traffic to the negotiated usage parameters is 
enforced on the WAN side of the UNI by the UPC (Usage Parameter Control). 




Figure 1: Framework of this study: the LAN traffic is multiplexed on a single 
VC and the prediction function is implemented at the ingress of the ATM- WAN 



In this study we shall not address the problem of the influence of rejected 
renegotiations at the CAC level (i.e. we assume that predicted usage parame- 
ters are always accepted by the CAC) and confine ourselves to the prediction 
problem. 

We note here that the usefulness of traffic prediction is not restricted at the 
UNI as described above; as a matter of fact, renegotiation of resources, either 
using signalling protocols or in band reservation schemes is also performed at 
other interfaces (typically NNIs), so that traffic prediction, if indeed efficient, 
could be implemented ubiquitously in ATM networks. 

The paper is organized as follows: after a presentation of the DBR traffic 
handling capability, we shall shortly discuss the possible benefits of periodically 
renegotiating the resources needed in the case of a bursty traffic; we shall 
then present the connectionist models for time series prediction, describe our 
predictor implementation and discuss the results. 
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2 Ressource Allocation Overview 



Among the various ways of allocating resources in an ATM network while 
protecting the QoS defined by the ITU-T (DBR, SBR, ABR, ABT, see [15, 11] 
for more details), we shall concentrate on DBR (Deterministic Bit Rate) and 
on the implementation of in-service parameter renegotiation for this capability. 

2.1 Description of the DBR Capability 

Hereafter, we briefly describe the DBR ATM layer traffic handling capability 
as currently standardized [11], that is without in-service renegotiation of the 
parameters. 

For this capability, the source simply declares a peak cell rate [PCR) and a 
cell delay variation tolerance (r^cr) for the duration of the call, and reservation 
will be attempted on the basis of PCR. 

The algorithmic definition of the peak cell rate is related to a virtual queue: 
the actual rate of a source is considered to be below the negotiated PCR as 
long as the buffer level of a (virtual) queue that is emptied at PCR is below a 
threshold Lmax which is related to the negotiated cell delay variation [CDV) 
tolerance Tpcr by 

^max ~ PCR X T^cr 

This definition is summed up in Figure 2. 




Figure 2: Definition of the DBR traffic descriptor parameters 



The algorithm used at the UPC so as to enforce the conformity of the incom- 
ing traffic to the traffic descriptor is known as the generic cell rate algorithm 
(GCRA). 

2.2 In-Service Renegotiation 

Obviously, it may be quite difficult to set the parameters defined above for the 
duration of the call, specially in the case of data traffic. 
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In this contribution, we propose to take advantage of the prediction ability 
of neural networks to renegotiate these parameters during the call (in-service 
renegotiation), so as to follow more closely the needs of the traffic. It should be 
noted that this renegotiation will not be performed in band, as in the ABT-IT 
capability, but will involve signalling automata as being currently standardized 
at the ITU-T [17]. 

We shall only consider periodic renegotiation of the parameters: instead of 
negotiating the parameter PCR for the duration of the call, negotiation will 
be carried for the next period under the assumption that Tpcr is fixed. 

Hereafter, we shall use for the peak cell rate the minimum value satisfying 
the conditions imposed by the GCRA, hence requiring the maximum precision 
from our predictor. In a real situation, some kind of safety margin might be 
allowed of course but, even under this most stringent requirement, we shall see 
that our predictor behaves very well since the notion of safety margin can be 
included in the construction of the predictor. 

We present below the traffic trace which has been used for this study. 



3 Description of the Traffic Traces 

As explained below, in order to get reliable results about the prediction capa- 
bilities of neural networks, it is necessary to use large real traces. The traces we 
have used are made TCP traffic recorded at the Berkeley and CNET Lannion 
gateways to the Internet. The traces are recorded on a packet per packet basis, 
each packet being characterized by its arrival time and the amount of TCP 
data transferred. 

One should be careful when using traffic traces recorded on existing net- 
works for studies of mechanisms to be implemented in future networks: obvi- 
ously, using real traffic traces to design and test new congestion management 
mechanisms for instance may be misleading since the characteristics of the 
trace itself can be strongly dependent on already existing protocols (TCP in 
our trace, for instance). The present situation is different: the trace we use 
certainly includes inter-network TCP dynamics but as the application we are 
aiming at is mainly private networks interconnection by ATM links this is not 
a drawback since traffic originating from such networks (which often are inter- 
networks themselves) will also contain such dynamics, TCP/IP being likely to 
stay as the main protocol stack for the next future in the area of data commu- 
nications. 

The Berkeley trace, hereafter referred as the LBL-PKT3 trace, has been 
thoroughly studied by other groups [16, 20] and has been shown to exhibit a 
very high variability (the average rate of the trace is 0.35 Mb/s with peak rates 
up to 1.7 Mb/s even when the rates are averaged on a time window as large as 
10 s) and strong long-range correlations or non-stationary behaviour (see [20] 
for a discussion of this issue) . 

The traces recorded at the CNET Lannion gateway exhibit similar charac- 
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teristics and consist of 12 hours of TCP traffic. 

Such traces should be representative of data traffic which ATM shall have 
to carry so as to support virtual private networks and “wide area LANs” . 




Figure 3: Evolution of the LBL-PKT3 trace. Each dot represents the mean 
input rate (in Mb/s) during a period of 10 s. The whole trace last 2 hours and 
has a mean input rate of 0.351Mb/s. 



As intuitive from Figure 3, the resources needed by the traffic wildly vary 
in time (even when averaged on a 10s time scale), indicating potential resource 
savings if such variations can be predicted. We shall now turn to connectionist 
models for time series prediction. 



4 Connectionist Models for Time Series Pre- 
diction 

Let a given one- variable time series be represented by the N values {xi, X 2 , 
• • •, ojiv}- Prediction then consists to find the future values xjsf^ 2 , • • *}• 

Takens [19] has shown that if the series is obtained from a deterministic dynam- 
ical system, there exists a scalar d (which is called the embedding dimension), 
a scalar r (which is an arbitrary delay) and a function / such that for every 
t > d ' r: 



— ? ^t — dr) ( 1 ) 

The prediction problem consists, given the first N values of a time series, 
to find the appropriate d, r and /. Of course one usually cannot be sure that a 
given series is deterministic. Actually, statistical methods do exist to verify if a 
series is deterministic and to find d as well as r but they require the size of the 
series to be on the order of 10^ which is rarely the case in practical problems. 
For the moment, let us assume that we know d and r and that we want to find 
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/. This is where neural networks come in: it is a well known fact that they can 
be used as universal function approximators [10]. 

The time series is cut into three non-overlapping sets: a training set, a 
validation set and a test set. The training set is used to find the weights of 
the neural network by minimizing a cost function using an iterative learning 
algorithm such as the backpropagation algorithm [18], the validation set is used 
to monitor the learning process (by cross-validation) and the test set is used 
to verify the real prediction performance of the network (that is, an estimated 
prediction error on future time series values). 

In prediction problems, we train the network with past examples (thus, we 
minimize a training error) but we really want our network to perform well on 
future examples (thus, have a minimal generalization error). We use the vali- 
dation set to estimate generalization error (note that the data in the validation 
set are not used to minimize the cost: minimization is only performed for the 
data in the training set). Training is stopped when the generalization error 
estimated on the validation set starts to increase (even if the training error 
is still decreasing), indicating that the training process begins to over-fit the 
training set. 

The best heuristics used to select r are based on the hypothesis that two 
successive values of the input data vector must be the least related in order to 
maximize information. For instance, one can choose the first zero of the auto- 
correlation function, or the first minimum of the mutual information function. 
In both cases of course, r must be as small as possible. 

The neural networks used in this study are multilayer perceptrons with one 
hidden layer. The architecture of such multilayer perceptrons is defined by the 
number of neurons in the input layer (i.e. the embedding dimension of the 
data) and in the hidden layer. 

Many heuristics exist to determine these architectural parameters, but this 
is still a hard problem. We also use cross-validation to select the neural network 
architecture. 



5 Predicting the traffic descriptor for the next 
period 

We wish to implement a prediction-based renegotiation of the DBR contract. 
We are thus looking for a mapping with the following inputs: 

• the current queue size, the current bit rate, 

• some kind of information characterizing the past traffic, 

and which would give as output the PCR consistent with a given r^cr (in 
this work, Tpcr is fixed for the whole trace) and the future traffic on the next 
H seconds. 
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This is not a simple prediction problem. In fact, the predictor should not 
only predict the future characterization of the traffic (this is the prediction 
part), but also deduce, for a given future traffic characterization and initial 
queue size, what would be the maximum queue size reached in the next period 
(this is the function approximation part). As it is known that neural networks 
are good for prediction and function approximation, they are good candidates 
to solve this problem. 

The information which characterizes the past traffic and the learning strate- 
gies are key issues for this prediction problem. They are described below (see 
Section 8). 

6 Framework of the experiments 

The following parameters have been used in our simulations: 

• leaky bucket dimensioning : Tpcr = 0.1s (which is consistent with the fact 
that data transmission are only lightly sensitive to delays) ; 

• we chose a value of 10 s for the negotiation period. The various ATM 
layer traffic handling capabilities and signalling mechanisms being still 
under discussion inside the standardizing bodies, this figure, although 
reasonable, should only be considered as indicative. We note here that in 
a different context, a renegotiation period of 1 s was estimated to allow 
as much as 40,000 calls [9]; therefore a value of 10 s should not stress 
the signalling mechanisms beyond their limits even for a large number of 
calls. 

Hence, in this experiment, PCR is predicted for the next 10 s period, and 
reservation is carried out on the basis of PCR only. Hereafter, we refer to this 
experiment as DBR-lOs. 

We would like to stress here that, as we are trying to predict the behaviour 
of a constrained extremum^ the problem is all the more difficult as the prediction 
horizon increases. Therefore, a 10 s horizon represents a significant challenge. 

The performance of the prediction machine is compared to the performance 
of an “oracle” who perfectly knows the future for the next negotiation period: 
the oracle does not attempt any “prediction” but simply calculates the param- 
eters from the data of the next 10 s; it is used to test the performance of the 
predictor, and its performance itself is also interesting since it shows what can 
be expected from optimal renegotiation when applied to a real bursty traffic. 

We shall first use the oracle to show the benefits brought by renegotiation; 
then we shall present the performance of our predictor for DBR using various 
learning strategies. 
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7 Oracle Results 



We first want to illustrate the importance of being able to dynamically negotiate 
the bandwidth in the case of a bursty traffic; Table 1 shows the resources in 
terms of buffer size needed if one aims, while not renegotiating the PCR^ at 
getting the same performances than DBR-lOs in terms of mean rate ( Rmean 
fixed, i.e. standard DBR case). Also given are the rates needed to get the same 
performances in terms of mean queue length {Lmean fixed) and maximum queue 
length {Lmax fixed). 



when 


Resources needed using... | 


DBR-lOs 


Standard DBR 


Rmean = 0.9 Mb/s 


Lmax ~ 0.4 M!b 


Lmax = 23.1 Mb 


Lmean - 0.09 Mb 


Rmean = 0.9 Mb/s 


Rmean ~ 5.5 M!b/s 


Lmax ~ 0.4 Mb 


Rmean =0.9 Mb/s 


Rmean = 3.7 Mb/s 



Table 1: Comparisons between the use of DBR-lOs with an oracle and standard 
DBR (no renegotiation). 

Obviously, an optimal dynamical negotiation of the bandwidth allows to 
save resources. We shall show below that, although not optimal^ prediction- 
based dynamical negotiation is indeed possible and also allows to save resources. 

8 Results for DBR-lOs Using the Neural Net- 
work Predictor 

In this section we present our results for different learning strategies and char- 
acterizations of the past traffic. 

Analysis of the time series characterizing the traffic lead us to choose r = 1 
and, from cross-validation, we determined d = 20 but the precise value ap- 
peared not to be crucial (if large enough) . 

8.1 A first ‘^heavyweight” experiment 

For a first experiment, the characterization of the past traffic was chosen to be 
the traffic means and variances of the volume of data arriving in 0.1s jumping 
windows, for the last 2 seconds. 

Using LBL-PKT3, we thus generated 72000 points of a time series character- 
izing the traffic behaviour, which was cut into three equal and non-overlapping 
sets (training, validation and test). The test set corresponds to the last 40 
minutes of the trace. 

The learning strategy was the following: for each time frame of 10 seconds, 
we furthermore generated 9 Active initial conditions (3 current queue sizes x 
3 current bit rates) , which were chosen around the initial conditions obtained 



419 





by the oracle for this time frame. We then computed for each situation, given 
we knew the future of the trace, the minimum bit rate consistent with Tpcr for 
the next time frame of 10 seconds. Hence, a sample is made of 

• the current file length 

• the current bit rate 

• the 20 means and 20 variances characterizing the past traffic 

• the target value of PCR which is used for the training of the neural 
network. 

This finally gave us a training set and a validation set of 216000 samples 
each. 

The results of this learning strategy were reported and discussed in [7] . As 
shown on Figure 4, the reservation made by the neural network are consistent 
with the activity of the source, and the negotiated traffic contract is violated 
only once on the whole trace. See [7] for more details. 




Figure 4: Results of the “heavyweight” learning strategy for the LBL-PKT3 
trace. The solid line shows the bit rate when the oracle is used; the dotted line 
shows the bit rate when our predictor is used. 



The main drawbacks of this approach are; 

1. a very large learning set leading to very long trainings. 
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2. a difficult choice of the correct initial conditions to be generated: for the 
neural network used above, these initial conditions were chosen around 
the values obtained by the oracle, a choice which, post facto, did not 
appear so good since the NN-predictor tends to use systematically greater 
bit rates than the oracle (which is quite natural) and hence generates 
smaller queues, so that the system driven by the NN-predictor evolves in 
a part of the phase space significantly different from the part where it 
was taught (i.e. when the system is driven by the oracle), 

3. a lack of “intuitive” control of the learning process: once the training and 
validation sets are generated, we have no control of what is happening. 

The main conclusion of this expriment is that a “blind and heavyweight” 
approach to our problem is indeed effective; in the following we shall investigate 
learning strategies which avoid the above drawbacks, the main drawback being 
in our opnion the third one. 

Inspecting the weights of the neural network, we also noticed that the vari- 
ances we used to characterize the past traffic were given weights so small that 
they were virtually useless. 

8.2 ^‘Lightweight” learning strategies 

8.2.1 Characterization of the past traffic 

Keeping the same neural network architecture, we modified the characterization 
of the past traffic so that the input layer now receives: 

• the current file length, the current bit rate 

• the quantity of data of the last 2 s agregated in 100 ms windows (20 
values) 

• the quantity of data of the last 20 s agregated in 1 s windows (20 values) 

It should be noticed that the characterization of the traffic we use does not 
require any fine-grained dynamical information (such as the interarrival times 
statistics for instance), but is only built of agregated quantity of data in fixed 
size windows. As the windows are indeed large, such a characterization should 
be implement able rather easily, without requiring accurate time-stamping. 

8.2.2 Basic learning algorithm 

For the three learning strategies described below, the learning algorithm is 
made of four steps: the training set is read sequentially and for each new 
renegotiation period (period N 1) we have 
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Figure 5: Results for Strategy 1 on a trace collected at CNET on June 18^* 
1996. The training set is made of the first 2 hours, the validation set of the 
next 2 hours and the rest of the trace (8 hours) is used as the test set. The 
solid line is the maximum file length predicted by the oracle; the dashed line is 
the maximum file length predicted by the neural network. 



1. a prediction step 

calculate the bit rate predicted by the neural network, Dpred 

2. a trace-driven simulation step 

• feed the trace for period AT + 1 in a file emptied at Dpred 

• calculate the maximum file length Lmax in period TV + 1 

• note that the initial conditions for period + 2 are Dpred and the 
file length obtained from the trace-driven simulation at the end of 
period A^ -h 1 

3. an error evaluation step 

• calculate the effective jitter tolerance Teff = 

• note that Tg// > r indicates a violation of the traffic contract 

4. a backpropagation step 

we investigated three different possibilities for backpropagating the error 
(r — they are detailled below 
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The main difference with the “heavyweight” learning strategy (Section 8.1) 
is that the initial conditions are now determined on the fly from the dynamics 
of the system driven by the neural network. Hence we can more efficiently 
explore the part of the state space spanned by the system driven by the neural 
network, which should lead to faster training times. 

8.2.3 Learning strategy 1: a simple-minded approach 

As is usually done, we backpropagate the error ('t — Te//)^ for every sample in 
the training set until the validation error starts increasing. 

This stategy converges extremely fast (typically less than a hundred iter- 
ations on the whole training set; an iteration involves backpropagation on all 
samples of the training set). 

The results of this approach are given on Figure 5. 

The performance of the predictor is obviously quite poor in terms of rene- 
gotiation; however, it must be noted that the neural network shows excellent 
generalization properties: in particular, it does react to the burst of activity 
between 13:00 and 14:00, although this burst fully lies in the test set and no 
such level of activity occurs in the training or validation sets. 

8.2.4 Learning strategy 2: a conservative approach 

For this strategy we try to get a conservative behaviour of the predictor by pro- 
gressively specializing the learning process on the worst samples of the training 
set. The learning strategy can be described as follows: 

• until no error is made in the training set 

— run the simulation for the whole training and validation sets 

— backpropagate the error (r — Tg//)^ for the worst sample in the 
training set (ie the largest Teff) for that run 

• until no error is made in the validation set 

— lower r to r' 

— run the simulation for the whole training and validation sets 

— backpropagate the error (?“' — Tg^^)^ for the worst sample in the 
training set (ie the largest Teff) for that run 

This strategy also converges extremely fast, typically less than a thousand 
iterations on the training set (note that an iteration involves only one back- 
propagation on the worst sample of the training set). 

As can be seen from Figure 6, the results in terms of renegotiation are more 
satisfactory; we get only two renegotiation errors, indicated by diamonds, on 
the whole test set (8 hours) and it is clear that specializing the learning process 
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Figure 6: Results for Strategy 2 on a trace collected at CNET on June 18^^ 
1996. The training set is made of the first 2 hours, the validation set of the 
next 2 hours and the rest of the trace (8 hours) is used as the test set. The 
dashed curve (upper curve) is the maximum file length predicted by the neural 
network, the solid line (middle curve) is the maximum file length predicted by 
the oracle; the dotted curve (bottom curve) is the effective jitter tolerance r^jf 
obtained by the neural network (re// > 0.1 s means a contract violation in the 
considered period). 



on the worst samples of the training set makes the neural network predictions 
conservative. 

The drawback of this approach is that the learning process very fast gets 
specialized to only one sample of the training set; surprisingly, such a strong 
specialization does not lead to a very poor generalization of the network and 
this puzzling result is left for further research (we note here that a somewhat 
similar result was obtained in [3, 4] in a different context). 

8.2.5 Learning strategy 3: the best of both worlds 

Despite its good results, we felt that Strategy 2 lead to a too sharp specialization 
of the training which could be detrimental to the generalization abilities of 
the neural network. We therefore investigated a new strategy which aims at 
combining the avantages of the two strategies above: 

• apply strategy 1, then 
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Figure 7: Results for Strategy 3 on a trace collected at CNET on June 18^^ 
1996. The training set is made of the first 2 hours, the validation set of the 
next 2 hours and the rest of the trace (8 hours) is used as the test set. The 
dashed curve (upper curve) is the maximum file length predicted by the neural 
network, the solid line (middle curve) is the maximum file length predicted by 
the oracle; the dotted curve (bottom curve) is the effective jitter tolerance Tejf 
obtained by the neural network {Tejf > 0.1 s means a contract violation in the 
considered period). 



• apply strategy 2 

Hence, the neural network is taught the entire phase space before being 
made conservative by specializing on the worst samples of the training set. 

As can seen from Figure 7, we do get the best of both stategies 1 and 2 
with this approach: the system is conservative as was the case for Strategy 2 
and is more adaptive as was the case in Strategy 1. 

The performances in terms of renegotiation are indeed excellent, as there is 
only one contract violation in the whole test set (8 hours). This burst can be 
considered as an example of a rare (hence, “unforeseenable” ) event which may 
lead to a traffic contract violation. Of course, the occurence of such an event 
is unavoidable when predictors are used to renegotiate the traffic contracts. 

If the prediction was to be implemented by the network as a service to 
the sources, the unavoidable occurence of such events means that these traffic 
contracts fall into the category of “predictive services”: no “hardcore” QoS 
guarantees are possible (whatever the technique, prediction is indeed a risky 
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Figure 8: Results for Strategy 3 on the LBL-PKT3 trace collected at Berkeley. 
The training set is made of the first half hour, the validation set of the next half 
hour and the rest of the trace (1 hour) is used as the test set. The dashed curve 
(upper curve) is the maximum file length predicted by the neural network, the 
solid line (middle curve) is the maximum file length predicted by the oracle; the 
dotted curve (bottom curve) is the effective jitter tolerance rg// obtained by 
the neural network (re// > 0.1 s means a contract violation in the considered 
period) . 



business !), but the QoS should be “almost always” as required by the source 

[13]. 

If the prediction was to be implemented by the source itself, the network 
only guarantees the QoS corresponding to the renegotiated contract and any 
violation of this contract is of the sole source responsability. 

For the sake of completeness. Figure 8 shows the results of Strategy 3 when 
applied to the LBL-PKT3 trace collected at Berkeley. There are no renegotia- 
tion errors on the whole test set (last hour of the trace). 

8.2.6 Another experiment with Strategy 3 

In order to test the long-term validity of our predictor, we ran another experi- 
ment; we kept the network as it was taught above (i.e. training was performed 
on a trace collected on the 18^^ June 1996) and used it as a pure predictor on 
a trace collected two days later (i.e. on the 20^* June 1996). The results are 
reported on Figure 9, and we obtain excellent results, with no renegotiation 
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Figure 9: Results for Strategy 3 on a trace collected at CNET on June 20^^ 
1996. The neural network was trained as described above with a trace collected 
on June 18^^ 1996, The dashed curve (upper curve) is the maximum file length 
predicted by the neural network, the solid line (middle curve) is the maximum 
file length predicted by the oracle; the dotted curve (bottom curve) is the 
effective jitter tolerance Tejj obtained by the neural network {reff > 0.1 s 
means a contract violation in the considered period). 



error on the whole trace (12 hours). 

Such a result shows that the characteristics captured in the neural network 
predictor by our training strategy are not strongly dependent on the trace it 
was taught and are still valid on timescales of days. This, combined with our 
fast training process and the simple measurements required for the training 
make the neural network approach to traffic descriptor prediction a perfectly 
viable technique ^ . 

9 Discussion 

Also given in the top line of the above figures are mean bit rates (in Mb/s) 
characterizing various aspects of the experiments: 

^ There is no “magic” involved in this however ! We also tried this predictor on the LBL- 
PKT3 trace and, although the adaptivity was surprisingly good, we got very poor results in 
terms of renegotiation. 
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• mean rate is the mean bit rate of the traffic; 

• mean oracle is the mean bit rate reserved by the oracle for DBR-lOs; 

• static DBR is the minimum bit rate reserved for the whole trace in the 
case of a standard static DBR; 

• mean NN DBR is the mean rate reserved by the neural network predictor 
for DBR- 10s. 

9.1 Comparison with the oracle 

It is clear that the reservation made by the neural network predictor is much 
larger than the reservation made by the oracle. This is easily interpreted since 
the neural network indeed tries to predict the worst future behaviour of the 
queue from the characterization of the past traffic and all the behaviours it 
has seen during the learning phase; the oracle knows perfectly the future, so 
that it makes its reservation on the basis of only one particular instanciation 
of the future behaviour of the queue, which is not necessarily a worst case 
instanciation. 

Hence, the quantitative comparison between the oracle and the predictor 
is not very informative. The main comparison should be a qualitative one as 
we already discussed: the oracle is closely taylored to the needs of the source 
and, at least for our Strategy 3, the comparison between the behaviours of the 
reservation of the oracle and of the neural network predictor shows that the 
neural network predictor indeed follows the big features of the activity of the 
source with some kind of safety margin. 

9.2 Comparison between DBR-lOs and static DBR 

A better quantitative information can be drawn from the comparison between 
the mean reservation made by the neural network predictor and the reservation 
of the best static DBR contract. 

From Figure 9 it can be seen that the mean DBR rate renegotiated by 
the neural network predictor (1.48 Mb/s) is smaller than the best static DBR 
contract (1.67 Mb/s) which could be negotiated for the whole trace (note that in 
order to negotiate such a contract you need to know the whole trace beforehand 
whereas our predictor has never seen this trace during its training process !). 

This indeed shows that neural network-based traffic contract renegotiation 
allows to save bandwidth while maintaining the quality of service. 

9.3 Future work 

Although excellent results have been obtained, our neural network is far from 
being optimal. In particular, it can be seen that the neural network does not 
seem to adapt correctly its behaviour in low activity parts of the trace (see the 
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CNET results in the 18:00-20:00 range). We are planning to use more sophisti- 
cated neural network architectures recently developped for pattern recognition 
[12, 8] in order to solve this problem. 

We are also currently extending this study to other traffic traces from dif- 
ferent origins and to different sets of parameters. We also plan to extend this 
work to the SBR traffic handling capacity. 

This presentation was restricted to the ATM context but, as the Internet 
evolves towards an Integrated Service Packet Network (ISPN) [6], it has also 
defined “traffic descriptors” based on leaky buckets which are used for the 
resource reservation in the network. Therefore the techniques developped here 
can also find applications in the Internet ISPN context. This may be even more 
natural since the in-service renegotiation capability is included in the signalling 
protocol RSVP [21]. 

10 Conclusion 

In this contribution, we have shown that the use of neural networks indeed 
allows accurate predictions of the extremal behaviour of a queue driven by a 
real traffic trace; we presented fast and intuitively simple learning algorithms 
for this difficult problem and successfully applied them to the dynamic resource 
reservation in an ATM network with a prediction horizon as large as 10 s. 

It has been shown that taking advantage of this prediction capability to pe- 
riodically renegotiate the parameters of ATM layer traffic handling capabilities 
was benefitial in terms of reserved resources. 

Such results are extremely encouraging for the use of connectionist predic- 
tion techniques for the management of a bursty traffic in B-ISDN networks. 
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Abstract 

Many call admission control schemes for ATM-type networks focus on the cell 
loss rate as the exclusive QoS metric and therefore base their Eb {Effective 
bandwidth) schemes on cell-loss rate approximations. We use simulation data 
to train an adaptive logic network (ALN) to estimate cell loss and delay; these 
estimates can then be used to compute effective bandwidths to satisfy both cell 
loss and delay. Results indicate that the ALN model is simple, computationally 
efficient, and sufficiently accurate for practical use. 
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1 INTRODUCTION 

Many call admission schemes in ATM networks are based on the concept 
of effective bandwidth. The effective bandwidth of N aggregated sources is 
generally less than N times the effective bandwidth of a single source (this 
phenomenon is known as statistical multiplexing gain). Most of the effective 
bandwidth research has focused on the cell loss rate (Clr). Anick et ah, 1982 
present a well-known fluid-flow model (AMS) for the case of N identical 
On/Off sources and an infinite buffer. The cell loss rate for a buffer of size S 
is approximated by the probability that the occupancy of the infinite buffer 
exceeds S. Guerin et al., 1991 propose a simplified version of the AMS model. 




A simple hinomial approach is described by Murase et al, 1991: the authors’ 
proposal is to approximate the cell loss rate by the probability that the com- 
bined cell rate of all N sources exceeds the link capacity. Rege, 1994, Sykas 
et al., 1993 compare these and other methods. One limitation of them is the 
low accuracy for high loss probabilities (e.g., > 10“®) and/or low buffer sizes. 
Another limitation is the lack of delay prediction, which in some applications 
is not less critical than the loss prediction. 

We consider using an adaptive logic network (ALN) for estimating cell loss 
and delay. The results are compared with the results produced by two other 
approaches: multivariate non-linear regression (REG) and the AMS model. 
We have extended the latter to predict the delay as well; we refer to this as 
the D-AMS model. The ALN model is shown to predict Eb fairly accurately 
for a wide range of buffer size (1-20 times the burst length), cell loss (0.1- 
10“®) and delay (1-5000 cells). This is the non-linear region of interest since 
the analytical methods (Anick et al., 1982, Guerin et al., 1991) are inaccurate 
at high cell loss. 



2 EXPERIMENTAL SETUP 



2.1 Models 

The traffic sources requesting admission to a queue are likely to have similar 
peak rates.* Therefore, to simplify the problem at hand, we limit ourselves 
to calls with identical traffic descriptors (homogeneous QoS requirements are 
common to all Eh schemes mentioned above). 

Each traffic source has an On and an Off state. When in the On state, 
the source generates cells at a deterministic rate of Per cells/s; no cells are 
generated during the Off period. The On and Off periods are exponentially 
distributed with means toN and toFF, respectively. The N sources are in- 
dependent, but have identical mean On and Off periods. The mean burst 
length, Bl, is given by Per xtoN, and the average-to-peak-rate ratio, AvfPk, 
is given by 



2.2 Factors affecting delay and cell loss 

The average delay Del and cell loss Clr are influenced by the number of 
calls N multiplexed at the queue, the characteristics of a single call (i.e., 
mean burst length Bl and average-to-peak-rate ratio Av/Pk), and the system 
parameters (i.e., buffer size S and service rate Bw). Decina and Toniatti, 1990 



*When it is required to multiplex sources with widely differing peak rates and/or QoS 
requirements, round-robin scheduling among multiple queues is generally preferred. 
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mention that the effective bandwidth for sources whose peak rate is greater 
than roughly one- tenth of the link bandwidth is influenced by the burst length. 
The peak rate for the sources under consideration is assumed to be at least 
10 times smaller than the link rate. This has the advantage of eliminating the 
need to consider Bl while obtaining an inverted model for Eh. The peak rate 
only affects the scale of operation without changing the relative effect of the 
other parameters. Eh is the service rate Bw normalized w.r.t. Per and N. Its 
value lies between Av/Pk and 1. 

The reason why we choose Bl and Av/Pk as the parameters representing 
the characteristics of a single call (rather than using toN and toFF directly) 
is the following. Suppose that we obtain via a simulation study a formula for 
Eh in terms of Clr, toN, toFF, S and iV. If we scale the peak rate used in the 
study by a factor of 10 - to find the service rate for another call type with the 
same characteristics - we will have to scale toN and toFF as well. Our formula 
will no longer apply since it was obtained using a call with different toN and 
toFF- On the other hand, if we choose Bl and Av/Pk as the independent 
variables, these will remain the same after the peak rate is scaled. Hence we 
can use the formula exactly as it is. 

We chose three of the five factors for a full-factorial study - S, Eh, N. The 
traffic generated by a single source can be viewed as a video session (Hyman 
et al., 1991), with toN = 25ms, toFF = 35 ms and Per ~ 14150 cells/s. This 
results in Bl = 353 and Av/Pk = 0.417. 



2.3 Training set 

The levels chosen for the three factors mentioned before are shown in table 1. 
When the number of calls is large, linear approximations are expected to be 
accurate (figure 1 left) and hence we do not consider more than 25 calls. The 
same observation applies when the buffer size is much larger than Bl (figure 1 
right) and so we only consider a maximum buffer size of 5000. Since Bl is 
353 cells, the maximum value of S/Bl is 22. As we were mainly interested 
in predicting Eh - its value lies in [Av/Pk,l] - we used a fine granularity 
of 0.02. We found that when Eh was close to its minimum value of 0.417 
(corresponding to Av/Pk), the steady-state values of Clr and Del were quite 
difficult to obtain accurately, even with very long simulations. Therefore, we 
opted to use a minimum Eh of 0.45. 

150 million cells were simulated in a single experiment. A total of 10 x 
7 X 19 = 1330 experiments were run using SMURPH (Gburzynski, 1996). 
The experiments were repeated with a different seed for the random number 
generator to improve the reliability of the delay and cell loss estimates. 
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Figure 1 Delay vs. N (left), delay vs. S (right) 

2.4 Test set 

The levels chosen for the three factors above are shown in table 2. The values of 
Bl and Av/Pk were the same as in the training set. The peak cell rate (Per) 
was 1000 cells/s - to verify that the ALN and regression models correctly 
predict delay and cell loss for peak rates different from that used in the training 
set. The mean on and off time was suitably altered to maintain Bl and Av/Pk 
at their values in the training set. The levels for the factors have been chosen 
to test the models on both interpolation and extrapolation. As in the case of 
the training set, a 150 million cells were simulated in each experiment and 
the experiments were repeated to obtain better estimates. 

Figure 2 (left) shows the variation of Del with Eb for varying N and S = 



Table 1 Levels of factors for training set 



Factor 


Levels 


Number of levels 


N. of calls 


1-5, 6, 8, 10, 15, 25 


10 


Buffer size 


400, 500, 600, 800, 1000, 3000, 5000 


7 


Service rate 


0.45-0.67 in steps of 0.02, 0.67-0.9 in steps of 0.03 


19 



Table 2 Levels of factors for test set 


Factor 


Levels 


Number of levels 


N. of calls 


1, 2, 3, 7, 9, 12, 20, 30 


8 


Buffer size 


370, 550, 900, 2000, 4000, 10000, 


6 


Service rate 


.44, .48, .52, .56, .6, .65, .7, .75, .8, .85, .9, .95 


12 
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Equivalent Bandwidth Fraction ( N = 25) ' Equivalent Bandwidth Fraction ( S = 1 000) 



Figure 2 Delay vs. Eh (left), Clr vs. Eh (right) 

1000, while figure 2 (right) shows Clr as a function of Eh. The qualitative 
information in these figures was used in deriving the regression and ALN 
models for delay. 

2.5 Regression 

In this section, we present an equation relating the delay and Clr to the three 
factors mentioned in section 2.2. We use for this the SPSS tool-box from the 
MATLAB package (see SPSS, 1992). 

First, we tried to obtain a regression equation for the delay in terms of 
various linear combinations of the three factors with no transformations on 
the factors or on the delay. Figure 3 (left) shows the scatter plot for the 
predicted values versus the actual values and it can be seen to be highly 
non-linear. 

After the non-linearities were removed, regression was able to explain 
99.78% of the variance in the delay (figure 3 right shows the final stage). 
The regression equation was 

log{Del) = 

1.2MQ xlog{S/Bl) + 5.2768 x F6 + 

1.5555 X logiN) - 1.6749 x log{S/Bl) x Eh - ^ ^ 

3.6725 xEhx log{N) 

where Eh denotes the service rate. 

Figure 4 shows the same stages for cell loss ratio. The variation in cell loss 
rate was much more difficult to capture, mostly because in a large number of 
experiments, the observed Clr was 0. Since the logarithmic transformation is 
not defined in this case, we had to eliminate these experiments from consider- 
ation. Consequently, only 509 experiments were available for regression. Also, 
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Delay obtained by simulation 
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Delay obtained by simulation 



Figure 3 Delay scatter: before transformation (left) and after transformation 
(right) 





Figure 4 Clr scatter: before transformation (left) and after transformation 
(right) 



a highly non-linear transformation had to be applied on the Clr to improve 
the quality of regression. The regression equation obtained for Clr was 



log{log^(Clr^)) = 

-0.6433 X log{SIBl) 

14.519 X log{Eb) + 

0.8781 X log{N) + (2) 

2.8648 X log{S/Bl) x log{Eb) + 

0.2471 X (S/Bl)^ 

0.2910 X 



It can be seen in figure 4 that the predicted Clr is still fairly non-linear with 
respect to the actual Clr. It is this non-linearity that lowers the quality of the 
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Service rate predicted by regression ustr>g Delay 



effective bandwidth prediction from (7/r, even though 99.82% of the variation 
has been explained by regression. Note that the equation for Clr requires 6 
terms whereas that for delay required only 5. 
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Figure 5 Clr scatter (log transformation) 

The reason for the complicated transformation on Clr in equation 2 can be 
seen by comparing the scatter plot in figure 5 with that in 4. In figure 5, the 
predicted values are obtained by merely using the log transformation on the 
Clr. 





Figure 6 Bandwidth prediction: delay (left), Clr (right) 

Equations obtained by inverting equations 1 and 2 were used to predict 
the service rate requirement for a given value of delay, the buffer size to burst 
length ratio, and the number of calls. Figure 6 (left) plots the values for service 
rate obtained from the inverted regression equation against the values used in 
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the simulation. It can be seen that the fit is better for the service rate obtained 
from the delay equation. Figure 6 (right) shows the service rate predictions 
for Clr, * 



3 ADAPTIVE LOGIC NETWORKS (ALN) 

An Adaptive Logic Network (ALN - Armstrong and Thomas, 1996, Arm- 
strong and Thomas, 1998) maps vectors of real values in Euclidean n- 
dimensional space to boolean values. The first layer of computing units con- 
sists of linear-threshold perceptrons that output 1 only if an inequality of 
the form wq -h wi x xi W 2 x X 2 + • • . + Wn x Xn >= 0 is satisfied. The co- 
efficients Wi of the expression are called the weights of the unit. The boolean 
outputs of the first layer of units are combined by a tree expression of AND 
and OR operators of arbitrary fan-in to produce the output of the ALN. 

One can also view the ALN in terms of the real-valued function it rep- 
resents. A functional computation can be derived from the ALN by tak- 
ing combinations of linear functions where the combining operators are 
the maximum and minimum of functions. If Xn is the output variable of 
the ALN, then weights of the first layer of units are normalized to have 
Wn = and the inequality of a unit is turned into a function of the form 
Xn — Wq -\r Wi X X\ W2 X 0:2 + • • • + Wn -1 X Xn -1 • 

The tree of maximum and minimum operators has the same form as the 
tree of OR and AND operators respectively. The linear functions have weights 
which are adapted based on training data consisting of vectors xi , . . . , Xn that 
represent the function graph. The algorithm is like least squares fitting of 
linear pieces to data points, where a linear piece is only active for a subset 
of the training points. Given xi^. . . a subtree of a node contains the 

active linear piece if its value (computed using the maximum and minimum 
operators according to the subtree) is less (or, respectively, greater) than the 
value of any other subtree for an AND (or, respectively, OR) node. 

The software that was used in the experiments reported on below refines the 
piecewise linear functions produced by the above ALN by inserting quadratic 
fillets at each junction of two linear pieces so that the overall function is 
continuously differentiable. 

ALNs have several advantages for ATM traffic characterization: 

• The normalized weights of the active linear pieces are partial derivatives of 
the output variable with respect to the input variables. Hence the partial 
derivatives of the learned functions can be directly controlled. 

• If an ALN represents a function Xn — f(a:i, . . . ^Xn-i) which is monotonic 



*Note that adjacent points belong to different experiments and are not related. However, 
they are joined by lines as this makes it easier to distinguish the predicted values from the 
simulation values. 
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increasing in some variable then a functional computation can be de- 
rived from that ALN which computes the corresponding function inverse; 
Xi = g{xi , . . . , Xi-i , Xj+i , . . . , Xn ) . This uses the coefficients of the same lin- 
ear pieces, combined in a different way. 

• The ALN does not require the predictor variables to be scaled or normal- 
ized. This speeds up the model development process as well the use of the 
model for prediction. 

Note that most performance functions are naturally monotonic. Forcing the 
learned function to be monotonic or convex makes it difficult for the function 
learned by an ALN to be influenced by the noise in training points; hence 
overtraining, which prevents good generalization in other neural networks, 
can be avoided in many cases. 



3.1 Choosing the epsilon values 

The ALN software allows the user to specify a smoothing parameter (e) asso- 
ciated with each variable that expresses the half-length of an interval which 
has to be covered by each point in a training set in each axis. Increasing e has 
the effect of smoothing the function in the direction of the variable, but the 
network cannot discriminate between points separated by less than e. 

In the case of the effective bandwidth problem, we have three input vari- 
ables: Eh, log{S) and N. The minimum value of the epsilon, ei^rnim for an 
input variable, i, is given by half the smallest interval, li^min^ between two 
adjacent levels of that variable in the training set. The maximum value for 
epsilon, ei^rnax^ is given by This prevents over-smoothing of the learned 

function. 



Table 3 Epsilons for D-ALN and C-ALN 



Predictor Variable 


Smallest interval 




^i,max 


Eh 


0.02 


0.01 


0.02 


log{S) 


0.1 


0.05 


0.1 


N 


1 


0.5 


1 



To prevent overtraining, it is customary to evaluate at periodic intervals 
the trained ALN on the test set. This has the disadvantage that the final 
evaluation on the test set does not really test the generalization of the ALN, 
since the ALN ‘has seen’ the test set. Instead, we divide the training set into 
two portions. Set 1 contains the simulation data for Eb = 0.45,0.49,0.53, . . . 
and is used as the training data. Set 2 contains the Eh — 0.47,0.51,0.55, . . . 
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After every every set of ten passes (epochs) through the training data (set 
1), the trained ALN is evaluated on set 2. The test data is used only when 
a satisfactorily trained ALN is obtained. It must be noted though that this 
partitioning of the training data has the effect of changing emin and Cmax for 
Eh to 0.02 and 0.04, respectively. 

Overtraining is also reduced by constraining the slope of Del w.r.t. the other 
variables; in this case, delay was constrained to decrease monotonically as Eb 
or N increases as well as to increase monotonically as a function of buffer size. 



3.2 ALN model for cell delay 

The inputs to the D-ALN are S, Eh, N and Del. Because of the large ranges of 
S and Del, we chose to train the D-ALN on log{S) and log(Del), respectively. 
Since the ALN is scale-invariant, we did not have to normalize the inputs as 
is usually done with other neural networks. 

The ALN software allows the user to specify Cop for the output variable. 
Whenever the root mean square error (RMSE) on the training set is greater 
than Cop, the ALN automatically grows in size to reduce the error. If the top is 
set to an unnecessarily small value, the resulting ALN may become very large 
without corresponding reduction in error on the test set. Too large a value of 
epsilon may prevent the ALN from learning satisfactorily due to inadequate 
size. 

In order to determine the optimum epsilon on the output variable, edeh fdr 
the D-ALN, three different values were tried. Table 4 shows the RMSE on 
the training set and the average relative error (ARE) on set 2 for each value 
of epsilon. Prom the table, we can see that the lowest ARE (0.73%) is is 



Table 4 Choosing the output epsilon for D-ALN 



Epoch 


e = 0.05 


O 

d> 

II 

vu 


e = 0.003 


10 


<0.0273, 1.28%> 


<0.0271, 1.16%> 


<0.0271, 1.16%> 


20 


<0.0162, 0.84%> 


<0.0142, 0.8%> 


<0.0142, 0.8%> 


30 


<0.0139, 0.79%> 


<0.0115, 0.79%> 


<0.0115, 0.78%> 


40 


<0.0128, 0.79%> 


<0.0096, 0.77%> 


<0.0095, 0.75%> 


50 


<0.0112, 0.81%> 


<0.0084, 0.75%> 


<0.0085, 0.75%> 


60 


<0.0114, 0.88%> 


<0.0079, 0.76%> 


<0.0079, 0.79%> 


70 


<0.0108, 0.78%> 


<0.0070, 0.73%> 


<0.0070, 0.78%> 


80 


<0.0110, 0.97%> 


<0.0070, 0.8%> 


<0.0068, 0.84%> 


90 


<0.0112, 0.97%> 


<0.0064, 0.81%> 


<0.0064, 0.87%> 


100 


<0.0106, 1.05%> 


<0.0058, 0.79%> 


<0.0060, 0.84%> 
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achieved for e = 0.01. Hence Cdei is chosen as 0.01. The table also shows the 
point at which training, if continued, would result in poor generalization. For 
example, when e = 0.01, ARE decreases continuously for the first 70 epochs 
and then increases. This indicates that training should be stopped after 70 
epochs. 

Next, we try to determine the optimum epsilon for each of the input vari- 
ables in a similar manner. For each variable, the values tried are emin, ^max 
and (cmin + ^max)l‘^- The epsilon with the lowest ARE is then chosen. 

The software tries to lower the RMSE to a specified level, rmsmin- Setting 
rmsmin to a value much below top has little advantage. On the other hand, 
setting rmSmin to Eop may not result in a satisfactorily trained ALN. Table 5 
shows the RMSE and ARE for three different values of rmSmin- It is clear 



Table 5 Choosing rmSmin for D-ALN 



Epoch 


rmSrnin = 0.01 


rmSmin = 0.005 


rrnsmin = 0.001 


10 


<0.0277, 1.2%> 


<0.0277, 1.2%> 


<0.0277, 1.2%> 


20 


<0.0143, 0.71%> 


<0.0143, 0.71%> 


<0.0143, 0.71%> 


30 


<0.0112, 0.69%> 


<0.0112, 0.69%> 


<0.0112, 0.69%> 


40 


<0.0103, 0.71%> 


<0.0103, 0.71%> 


<0.0103, 0.71%> 


50 


<0.0086, 0.72%> 


<0.0086, 0.72%> 


<0.0086, 0.72%> 


60 


- 


<0.0076, 0.77%> 


<0.0076, 0.77%> 


70 


- 


<0.0070, 0.84%> 


<0.0070, 0.84%> 


80 


- 


<0.0068, 0.85%> 


<0.0068, 0.85%> 


90 


- 


<0.0064, 0.82%> 


<0.0064, 0.82%> 


100 


- 


<0.0063, 0.90%> 


<0.0063, 0.9%> 



that there is little advantage in choosing rmSmin < 0.01. It can also be seen 
that training should be stopped after 30 epochs. 

Figure 7 compares the estimates of the D-ALN on the training and test 
sets with simulation results. Since the test set was generated for Per = 1000 
cells/s while the training set used Per = 14150 cells/s, the figure shows that 
the trained ALN can be used to predict delays for other values of Per fairly 
accurately as long as Bl and Av/Pk remain the same. 



3.3 ALN model for cell loss 

The inputs to the C-ALN are the same as those to the D-ALN. The ALN 
is trained on log{Clr). The points in the training set for which Clr = 0 are 
eliminated. This results in the C-ALN being trained on 899 data points, which 
is still adequate considering that we have only three input variables. 
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Predicted Delay 





Figure 7 D-ALN model predictions: training set (left) and test set (right) 



As in the case of the D-ALN, the training was done on one half of the 
training set, with the other half being used to check for overtraining. The 
learned function was constrained by specifying that the Clr is non-decreasing 
for an increase in Eb,N, or S. 

The optimum epsilons for Eh, log{S) and N were obtained, in each case 
being equal to the respective minimum epsilons. The optimum output epsilon, 
€cir, was found to be 0.05. The best setting for rmsmin was also 0.05. Training 
was stopped after 40 epochs. 





Figure 8 C-ALN model predictions: training set (left) and test set (right) 

Figure 8 compares the predictions of the C-ALN model on the training and 
test sets with simulation results. As in the case of Del, the results obtained 
by evaluating the trained model on the test set show that the ALN can be 
used to predict Clr for different values of Per. Section 4 presents a numerical 
comparison of the Clr predictions by the C-AMS, C-REG and C-ALN models. 

The performance of the C-ALN model on the test set is not as good as 
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for the D-ALN model. The maximum relative error on the test set (19.9%) is 
almost twice that of the D-ALN (11.8%). We can also see a larger scatter at 
low loss (< 10”^) in figure 8. This is primarily because of the smaller training 
set due to the missing elements corresponding to zero Clr. These values could 
be replaced by more accurate estimates from longer simulation runs at high 
service rates and large buffer sizes. 

Figure 9 (left) compares D-ALN model predictions for N = 30 {a, value not 
in the training set) with simulation values. Figure 9 (right) shows the ability 
of the C-ALN model to extrapolate accurately. 




-0.5 0 0.5 1 1.5 2 2.5 3 3.5 -6.5 -6 -5.5 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 

Predicted Delay Predicted CLR 

Figure 9 Extrapolation for N = 30: D-ALN model (left) and C-ALN model 
(right) 



3.4 Residual analysis for ALN models 

Since the ALN uses the ordinary least squares (OLS) principle to determine 
the orientation of the linear pieces, the trained ALN is subject to the assump- 
tions inherent in the use of OLS. 

The assumptions made in the case of OLS are (Gunst and Mason, 1980): 

• Predictor variables are non-stochastic and measured without error. Since 
the predictor variables are controlled inputs to the simulation, this state- 
ment is true. 

• Model error terms follow a normal probability distribution. This can be 
seen to be approximately true from the histogram plots of log{Del) and 
log {Clr) residuals (figure 10). 

• Any two errors are independent of each other. The presence of correlation 
reduces the reliability of the model. To verify this, we plot the residuals 
against the values obtained by simulation (figure 11). 
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Figure 10 Histograms of residuals: D-ALN (left) and C-ALN (right) 



• Model error terms have zero means, are uncorrelated, and have constant 
variances. The first of these can be seen to be true by observing that the 
standard deviation limits appear equidistant from the horizontal axis (fig- 
ure 11). The second condition is generally true for databases compiled from 
controlled laboratory experiments. The third assumption is discussed be- 
low. 

Prom the log{Del)-Tesiduals plot, we can see that the errors are randomly 
distributed on either side of the horizontal axis, indicating that there is no 
systematic error. Since the log function is nonlinear for values of the abscissa 
< 10, we have eliminated these values from the plot. In any case, we are more 
interested in delay predictions at higher delays. 
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Figure 11 ALN residuals plots: log(Del) (left) and log(Clr) (right) 

The /op(C/r)-residuals plot shows that the residuals are fairly random for 
Clr > 10“®. For lower cell loss, the simulation results themselves are inaccu- 



444 








rate. The residuals are seen to increase in magnitude as log{Clr) decreases 
below —4, leading us to conclude that there is a small amount of heterosced- 
asity (unequal variances). 



4 COMPARISON OF DELAY AND CELL LOSS PREDICTIONS 

We compare here the delay and cell loss predictions by the AMS, ALN and 
REG models against the delay values obtained from the simulations. 

Table 6 makes a numerical comparison of the delay predictions based on 
the average and maximum relative errors* in the test and training sets. Prom 
the table, we can see that the D-ALN performs the best on both the test and 
training sets while the D-REG model comes a close second. The average error 
of the D-ALN on the test set is almost half that of the D-REG model and 
only one-third that of D-AMS. The similarity of the average errors on the test 



Table 6 Numerical comparison of delay predictions 



Method 


Training set 

Maximum error Average error 


Test set 

Maximum error Average error 


D-AMS 


14.6% 


3.3% 


16.1% 


3.0% 


D-REG 


11.9% 


1.95% 


12.2% 


2.3% 


D-ALN 


4.93% 


0.6% 


11.8% 


1.32% 



and training sets for both the simulation-based models indicates that we have 
been successful in curtailing model overtraining. 

Prom the scatter plots, it appears that the C-ALN model does very well 
at high cell loss, but its performance declines at lower cell loss. The C-REG 
model appears more consistent. Both models clearly perform better than the 
C-AMS model. 

Table 7 makes a numerical comparison of cell loss predictions based on the 
average and maximum relative errors in the test and training sets. We see 
that the C-ALN is considerably better than the analytical model, with its 
average error on the test set being only one-eighth that of the latter. The 
C-REG model is closer to the C-ALN model in terms of average error, but its 
maximum error is much larger. This indicates that the simulation-based mod- 
els can be fairly accurate while offering the benefit of being computationally 
inexpensive. 



*The errors for the delay and cell loss models are computed by comparing against simulation 
results for log{Del) and log{Clr), respectively. 
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Table 7 Numerical comparison of cell loss predictions 



Method 


Training set 

Maximum error Average error 


Test set 

Maximum error Average error 


C-AMS 


56.5% 


21.7% 


61.7% 


24.0% 


C-REG 


17.1% 


3.0% 


32.4% 


5.2% 


C-ALN 


19.28% 


1.27% 


19.9% 


3.85% 



5 APPLICATION: EFFECTIVE BANDWIDTHS 

In the preceding section, we examined three methods of estimating the cell 
loss and mean delay in our system. We now apply those methods to effective 
bandwidth computation. We compute effective band widths that satisfy cell 
loss requirements, mean delay requirements, and both. Two requirement sets 
are considered: the first requirement set {Clr < 10“^, delay < 100) corre- 
sponds to a situation with stringent delay and jitter requirements but rela- 
tively high tolerance for loss (e.g., real-time video); the second requirement 
set {Clr < 10“^^, delay < 1000) corresponds to the more common case where 
low loss is required, but delay can be tolerated. Throughout this section, we 
set Per — 14150, toN = 0.025, and toFF = 0.035 (this yields an Scr (sus- 
tained rate) of 0.417 x Per — 5896). The bandwidths are plotted per source 
and relative to the Per, and are thus in [0.417, 1]. 

Of the three methods, only the ALN could be explicitly inverted; to obtain 
effective bandwidths from the regression function, and in the AMS model, we 
used a simple binary iteration approach. We found that generally, accurate 
results can be obtained in only 5-10 iterations. The declared range of input 
values for the ALN was [0.4, 1], and the inverted ALN therefore gave values 
in [0.4, 1], sometimes returning values slightly below the Ser. 



5.1 Effective bandwidth vs. number of calls 

We compute effective bandwidths as a function of the number of calls, which 
we vary from 1 to 20. The buffer size S is fixed at 5000. The results are plotted 
in figure 12 (requirement set 1 on the left, requirement set 2 on the the right). 
There are six lines per requirement set: for each of the three approximation 
methods there is a line for the delay-based effective bandwidth and another 
for the cell-loss-based effective bandwidth. The effective bandwidth needed to 
satisfy both delay and cell loss requirements is simply the maximum of the 
two lines. 

We note that the delay requirement is dominant in requirement set 1. The 
three estimates for delay-based effective bandwidth are very close to each other 
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Effective bandwidth and number of sources, S = 5000, cir <= 0.0001 , del <= 1000 
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Figure 12 Effective bandwidth vs. N 



and show the typical decaying shape (indicating a statistical multiplexing 
gain). The estimates for the loss-based effective bandwidth are close to the 
Scr even for AT = 1 and do not change when N is increased. In this case, 
it is clear that the ALN underestimates the effective bandwidth, because the 
estimate is about 5% below the Scr. 

For requirement set 2, the loss requirement dominates slightly. As expected, 
the tighter loss requirements increased the loss-based effective bandwidth, 
and the less stringent delay requirements reduced the delay-based effective 
bandwidth. 



5.2 Effective bandwidth vs. buffer size 

We compute the effective band widths as a function of the buffer size, which 
we vary from 1 to 10000. The number of calls N is fixed at 10. The results 
are plotted in figure 13 (requirement set 1 on the left, requirement set 2 on 
the right). Once again, there are six lines per requirement set: for each of 
the three approximation methods, there is a line for the delay-based effective 
bandwidth and another for the cell-loss-based effective bandwidth. 

The values for delay-based effective bandwidth computed by the three meth- 
ods exhibit a characteristic step form: the delay requirements can be satisfied 
by any bandwidth if the buffer size is less than the delay requirement; in 
our figure, the dots are plotted close to the sustained cell rate line. Once the 
buffer size is relatively large, adding buffers leaves the mean delay (and there- 
fore the delay-based effective bandwidth) at a constant level. Between these 
two extremes, there is a narrow range of buffer values where an increase in 
buffer size leads to an increase in mean delay. The reason for this increase is 
a simultaneous sharp decrease in the cell loss rate, resulting in the delay of 
cells that would otherwise be dropped. 
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Figure 13 Effective bandwidth vs. S 

Comparing the graphs for loss-based and delay-based effective bandwidth, 
we note that the cell loss requirement dominates for small buffer sizes and that 
the delay requirement dominates for larger buffer sizes. The crossover point 
between these two regions depends on the requirements and on the traffic 
characteristics. There is no increase in statistical multiplexing gain once the 
cross-over point has been reached. 

The three estimates for loss-based effective bandwidth are divergent when 
effective bandwidth is high (i.e., when the buffer size is small), and somewhat 
closer when effective bandwidth is low. This can be explained by the fact 
that the regression function and the ALN are based on simulations of buffer 
sizes > 400, and are therefore inaccurate when buffer sizes are small. In the 
case of delay-based effective bandwidths, the three estimates are very close to 
each other. The loss-based curves have a shape similar to those obtained by 
Rege, 1994. 

6 CONCLUSIONS 

We have discussed a simple and reasonably accurate scheme for predicting 
the delay and cell loss when a number of bursty sources are multiplexed at a 
link with finite buffer space. Unlike other schemes, our model uses the number 
of sources N as an input, leading to delay and loss computation times that 
are independent of the number of sources being multiplexed. Though we have 
used On/ Off bursty sources in order to compare the results with other effective 
bandwidth schemes, the techniques presented here can be extended to complex 
sources that cannot easily be modeled analytically, e.g., aggregate LAN traffic, 
MPEG traffic. 

The ALN model has the advantage over other neural networks that the same 
network that is trained to predict cell loss or delay can be inverted to predict 
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effective bandwidth instead. This means that the ALN can make effective 
bandwidth predictions as quickly as it can make delay or loss predictions. 
Since the evaluation time is very small - of the order of milliseconds - ALNs 
are highly suitable for use in real-time CAC. 

Note that although the regression model is only somewhat worse than ALN, 
it was very painful to build (mostly by educated trial and error) , it cannot be 
easily reversed, and it is completely useless (must be rebuilt practically from 
scratch) for a traffic pattern with different characteristics. 

ALN implementation in hardware is easy because the ALNs are composed 
of AND gates, OR gates and simple linear threshold elements. This can result 
in an increase in evaluation speeds by an order of magnitude or more. 
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Abstract In this paper we propose a simple Fair Cell Discarding (FCD) algorithm 
with virtual per-VC queueing. The main objective of FCD algorithm is 
to redistribute the cell losses according to each connection’s QoS require- 
ment. In case the buffer is full, the newly arrived cell is not automatically 
discarded but rather a cell from a sub-queue U which is selected accord- 
ing to a fairness criterion w^hich is defined by connections mean arrival 
rates and their QoS (Quality of Service) such as CLR (Cell Loss Ratio) 
requirements. Extensive analytical analysis and simulations have been 
conducted to evaluate FCD algorithm’s property and to validate its ef- 
fectiveness under various traffic and switch configurations. Our study 
reveals that a very simple approach like FCD can actually differentiate 
connections with different traffic characteristics and QoS requirements; 
and satisfy their specified QoS as long as a minimum amount of resources 
(bandwidth and total buffer space) is allocated for guaranteeing the ag- 
gregate QoS/CLR requirement of all active connections. 

Keywords: ATM Neworks, quality of service, per-VC queueing, fairness, selective cell 
discard. 



1. INTRODUCTION 

ATM networks have become a reality. However, how to successfully 
provide a specified QoS (Quality of Service), and in particular how to 
differentiate connections with different QoS requirements, is an impor- 
tant open issue which will impact the success of ATM technology in 




heterogeneous network environments and yet is far from a satisfying 
resolution. 

In this paper, we argue that in order to guarantee individual con- 
nection’s QoS, a Per- VC queueing architecture should be used, and in- 
telligence must be added to the cell discarding process. We tackle the 
issue by proposing a Fair Cell Discarding (FCD) algorithm with virtual 
per- VC queueing. 

In the past, much effort has been spent focusing on methods for 
increasing network efficiency by maximizing statistical gain while still 
guaranteeing connections QoS, and most of that work is based on a 
single FIFO (First In First Out) buffer for all connections. The obvi- 
ous problem with this approach is that there is no way that one can 
distinguish one connection from another without adding some costly en- 
hancement. A previous study (Mang et a/., 1996) shows that in a single 
FIFO queueing system, if the cells are dropped upon arrival in case of 
buffer overflow, for a given set of connections, the burstier connections 
experience a higher CLR (CeU Loss Ratio) and actually exceed their 
QoS threshold, while the QoS of aggregate traffic is still satisfied. 

In contrast to a single FIFO queueing architecture, per- VC queueing 
provides a more flexible and natural architecture for enhancing traffic 
management at ATM switches. Assuming per- VC queueing architecture 
at the ATM switch, which is already a reality, one needs to decide how 
to fairly distribute shared resources across active connections in order 
to provide QoS assurance to individual (or groups of) connections. Re- 
source re-distribution can be in various forms, i.e. service scheduling 
(Parekh et al.^ 1993) (Golestani, 1994)(Lee et al.^ 1994)(Demers /em et 
al. 1989) (Archambault et al,^ 1996), buffer management (Choudhury 
et a/., 1996)(Collier et aZ., 1996) (Takagi et a/., 1991) and selective cell 
discarding (Yang et al,^ 1996) (Conway et al.^ 1996)(Kawahara et a/., 
1996) (Chen et al.^ 1996)(Heyman et al.^ 1992)(Wilson et al.^ 1996) at 
moments of resource scarcity. Service scheduhng primarily deals with 
how to fairly allocate available bandwidth to a set of connections. It 
regulates queue length distribution for each logical sub-queue, but has 
no control over which cell should be dropped or discarded in the event 
of buffer overflow. This in turn is handled through buffer management 
and selective cell discarding schemes. They both control access to the 
buffer space. 

A variety of buffer sharing strategies is summarized and analyzed in 
(CoUier et aZ., 1996) under non-uniform bursty traffic. Some well known 
strategies are complete partitioning, complete sharing, partial sharing, 
sharing with minimum allocation, sharing with maximum queue length, 
and fair sharing. All these sharing schemes require some pre- calculated 
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threshold values, which demand prior knowledge of traffic characteris- 
tics and of the service discipline to be used, and, therefore, are very 
difficult to determine. In (Choudhury et a/., 1996), a dynamic queue 
length thresholds scheme is proposed to selectively drop packets where 
the maximum permissible length for any individual queue at any instant 
of time is a function of the unused buffering in the switch. Compared 
with the static threshold scheme, this scheme automatically adapts to 
load variation among output queues, and hence it is more robust. How- 
ever, the assurance of cell loss requirements for individual connections 
is not addressed in any of the above schemes. 

A weighted fair blocking (WFB) mechanism for discrete-time mul- 
tiplexing is proposed in (Conway et a/., 1996). WFB rejects some of 
the packets in a batch arrival when there is an insufficient number of 
available buffers. Although it is claimed that WFB decouples buffer 
dimensioning from parameterization of the discard mechanism, the pro- 
cess of determining appropriate weights so as to satisfy different loss 
requirements in a heterogeneous traffic environment remains as diffi- 
cult as determining appropriate threshold values in any threshold based 
control policies. The fact that linear programming approach is used in 
the calculation of the packet selection probabilities and prior knowledge 
of source traffic characteristics is required further prevents WFB from 
being a practical solution. 

In (Heyman et aZ., 1992)(Yang et al.^ 1996) the cell loss performanc 
e of individual connections is addressed through selective discarding. In 
(Heyman et al.^ 1992), the ATM multiplexer keeps track of accumulated 
cell losses for all active connections. When a new cell arrives, if the 
buffer is full that cell is not automatically dropped. Instead, a cell from 
a connection that has the lowest current accumulated loss rate (among 
all connections that have cells currently in the buffer) is dropped. The 
arriving cell is dropped only if its connection has the lowest loss rate 
so far. This approach is based on the homogeneous traffic assumption. 
Since it requires on-line measurements of both number of losses and 
arrivals, this approach can only remain as an illustration of how selective 
discarding affects cell loss performance of individual connections rather 
than 

a practical solution. A generalized version of the above, a “QoS- 
scheme”, is given in (Yang et a/., 1996) to deal with situations where 
traffic stre ams have different traffic characteristics as well as different 
loss (QoS) requirements. When a cell arrives and the buffer is full, a 
cell is discarded only if it belongs to a connection with the smallest 
ratio between its loss ratio measurement and its loss requirement, a 
predefined number. The limitations of this QoS-scheme are that on-line 
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measurements of cell loss ratio are required, and it is not effective if all 
connections are not equally demanding. 

The main objective of the FCD algorithm we propose here is to re- 
distribute cell losses according to each connection’s QoS requirement. 
The model we consider in our study is an ATM statistical multiplexer 
with a complete shared output port buffering. FCD employs a fairness 
criterion defined as a function of traffic load and CLR requirements. We 
define as connection x’s Fair Ratio of Loss (FRL): 



j^fair _ 



^xPx 



Ef=i Kp. 



( 1 . 1 ) 



here stands for the mean arrival rate of connection Xj N is the total 
number of active connections, and is connection x’s CLR requirement 
. FCD employs an approach similar to one suggested in (Heyman et al.^ 
1992)(Yang et a/., 1996) where, in case the buffer is full, the newly 
arriving cell is not automatically discarded but rather a cell from a sub- 
queue u which is selected according to the fairness criterion . Here, 
for the first time, we provide an exphcit implementation for how to select 
and discard cells. In the above existing methods only observations and 
selective discard principles are reported. None of those proposals provide 
the mechanism to deal with the situation where the newly arriving cell 
does not belong to the selected connection or the subqueue of the selected 
connection is empty. By introducing a Bargain Vector, we provide a 
very effective and practical way to handle this situation. If the selected 
sub-queue u is empty, a cell can be discarded from another non-empty 
sub-queue. This event is recorded by a Bargain Vector B. The reason we 
call B a Bargain Vector is that it keeps a record of loss exchange between 
connections. At time ti connection u may lose a ceU for connection 
and at t 2 connection w may lose a cell for connection n, and so forth. The 
content of Bargain Vector B is a set of integers. For n > 0, = n means 

that n cells were discarded from connection u because other selected sub- 
queues were empty, while Bu = —n means that n cells were discarded 
from other sub-queues because the selected sub-queue u was empty. In 
the long run, the use of Bargain Vector B further enhances the fairness 
preserved by FRL of 1.1. 

In our proposed scheme, per- VC queueing architecture can be gen- 
eralized into per-class queueing. A class is a group of VCs which have 
similar traffic characteristics and QoS requirements. Such generaliza- 
tion tackles the concerns when there are thousands of VCs. Current 
(available on the market) ATM switches support up to 12K VC/VPs. 

We describe the FCD algorithm and discuss its properties in Section 2. 
We devote Section 3 to validating the effectiveness of FCD under various 
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traffic scenarios and switch configurations. In Section 4, we conclude our 
study and highlight our findings. Mathematical proofs are given in the 
Appendix. 

2 . FAIR CELL DISCARDING ALGORITHM 

2.1 FCD DESCRIPTION 

The model we consider in our study is a complete buffer sharing 
scheme, where cells from different connections form logical sub-queues. 
Let Qx be the sub-queue length of connection x and 5 be the buffer size, 
the FCD algorithm is given as follow: 

Intialize: 

N is the number of active connections; 

For x=l:N Calculate Bx = 0 

After a call is accepted: 

N=Nf1; 

For x=l:N Calculate 
When a cell of connection w arrives 

IF 12xQx = S (Buffer is full) 

(a) Select u £ {TA'} according to 

■ Option I - Deterministic selection 

■ Option II - Probabilistic selection 

(b) IF Qu = 0 or Bu > 0 

Choose V by By = min { 5^7 ; Qx >0} 

Update By = By + 1 and By = By — 1 
ELSE 
v=u 

(c) IF v=w 

Discard the newly arrived cell 
ELSE 

Discard a cell from sub-queue v 
Put newly arrived cell in the sub-queue w 
2. ELSE (Buffer is not full) 

Put newly arrived cell in the sub-queue w 
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In Table 1.1 we demonstrate how the values of B can be updated for 
the case of 3 connections. The total buffer size is S = QO(cells) and 
stands for the instant of the nth. example. If a connection is selected at 
its corresponding queue length is underlined. If a connection’s queue 
length or its bargain vector value is updated at the updated value 
is highlighted in bold. Note that after the update, the total number of 
cells in the buffer is 59, leaving room for the new cell. 

Table 1.1 How FCD Works 





hUt 


hht 


hUt 


uht 


hht 


Qi 


28/27 


55/55 


0/0 


15/15 


6/5 




0/0 


0/0 


-3/-4 


1/1 


0/0 


Q2 


2/2 


0/0 


60/59 


20/19 


40/40 


B2 


0/0 


0/-1 


5/6 


-2/-1 


-1/-1 


Qs 


30/30 


5/4 


0/0 


25/25 


14/14 


B3 


0/0 


0/1 


-2/-2 


1/0 


0/0 



The above table can be further explained as following: 

■ ti'. Selected Qi > 0 , = 0 ; discard one cell from Q\. 

■ ^ 2 - Selected Q 2 = 0; update B2 and ^ 3 ; discard one cell from Q3. 

■ Selected Qi = 0 ; update B\ and ^ 2 ; discard one cell from Q2. 

■ ^4: Selected Qs > 0 , 5 s > 0 , B2 = mm( 5 i, ^2, -B3); update B2 
and B3; discard one cell from Q2- 

■ Selected Qi > 0 , = 0; discard one cell from Qi 

There are two alternatives of implementing selection process of step 1(a) 
in the above FCD algorithm: 

Option I - Deterministic implementation. Connection u is ob- 
tained in a Round Robin fashion from a Discard Table (DT) whose 
entries are integers of {1,^} representing active connections. A sim- 
ple generalized round robin algorithm for constructing DT based on the 
vector {i^“^^}is given in (Arian at a/., 1992). The algorithm is optimal 
for N = 2 and pseudo-optimal for N > 2. The DT has to be recomputed 
whenever a connection is added or terminated. 
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Option II - Probabilistic selection. Connection u is obtained ac- 
cording to the distribution using a random number generator. 

2.2 PROPERTIES OF FCD 

We now state some theoretical properties of the FCD algorithm. Proofs 
are given in the Appendix. 

Property I. For given total buffer size and total bandwidth, the FCD 
algorithm is path-wise equivalent (in terms of total cell losses) to com- 
plete buffer sharing schemes where cell dropping occur upon cell arrival. 

Property II. For given buffer size and bandwidths Ci > C2, the re- 
sulting CLRs (Cell Loss Ratios) satisfy 

Pijcy ^ P2{cy _ _ pn(Ci) 

Pl(C2) P2{C2) ■■■ Pn(C2) 

Properties I and II do not claim that FCD controls the aggregate loss 
performance, but rather that it re-distributes the losses across connec- 
tions in a fair manner. A direct benefit from Propertys I and II is that 
FCD adds more freedom to the CAC (Connection Admission Control) 
process and reduces buffer management overhead. In FCD one can have 
both, the efficiency of complete shared buffering and the performance ad- 
vantages of dedicated buffering. The latter is achieved through selective 
discarding where a new arrival is not automatically rejected when the 
buffer is full. 

Property III. If a subset of connections VCU submits traffic at a 
lower mean rate, A*, than their declared rate, Xy, i.e. A* < A-^, then the 
resulting CLRs satisfy 

(a) K<P«, V«G{l,iV}. 

and 

(b) — < — , \/w d V and Vu G V 

Pw Pv 

Property III implies that if a connection u is submitting traffic at a 
lower rate than what it declares, all active connections’ loss rates are 
lower; however, this particular connection n’s normalized CLR is higher 
than the others. We define “normalized CLR” as Also 

CLK requirement 

from Property II we note that if all active connections are submitting 
traffic at their declared rates, FCD guarantees their normalized CLR to 
be equal. 
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Property IV. If a subset of connections VCU submits traffic at a 
higher mean rate, than their declared rate, Xy, i.e. A* > then the 
resulting CLRs satisfy 

(«) p*u>Pu, yue{i,N}. 

and ^ ^ 

(b) — > — , \/w 4 V and Vt; G V 

Pw Pv 

Property IV states that if some connections violate their traffic con- 
tracts, all connections’ QoS will be degraded. In order to protect con- 
forming sources from violating sources, a misbehaved traffic will be 
tagged by UPC (Usage Parameter Control) and in case of buffer over- 
flow, arriving tagged cells are discarded first before PCD is applied. 



2.3 ROBUSTNESS OF FCD 



Regarding the use of the bargain vector, a question one may ask is 
whether B needs to be reset periodically. If connection u is silent for 
AT period, then |jB^^(AT + t) — By{t)\ < pyXyAT, and the time needed 
to accumulate one debt will be 



T. > 



PuXy 



Assume that most of applications’ mean rates are in the range of megabits 
per second {1Mbps = 2359cellslsec), and CLR requirements are around 
10~^, it is safe to say that Tg is around 4240 seconds. For instance if a 
connection is idle for more than one hour and its mean rate is 1 Mbps 
and its CLR requirement is 10“^, we have \By{lhour)\ < 1. In order to 
have more than 10 debts, on average it will take more than 10 hours. 
It is very rare for a connection to be silent for such a long period of 
time and still be considered an active connection. Thus in conclusion, 
normally the periodical reset of Bargain Vector is not necessary. 

The bargain vector can also adapt to connection dynamics. When a 
new connection is accepted or an existing connection is terminated, there 
is no need to reset B. The reason is that FCD always picks a connection 
with mm{Bx}^ which has a significant implication for the case that a 
connection is terminated. For instance, let u be the connection being 
terminated, with a bargain vector value of By. 



■ If < 0, it implies that some of remaining active connections 
lost more cells for u, which is also reflected in Bx^y>x G {l,iV}. 
The selection rule of min{Ta;} ensures that the remaining active 
connections share those By losses left over by connection u fairly. 
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Table 1.2 Traffic types (classes) used in simulation studiesSEEE 



Type 

Peak rate (Mbs/sec) 
Mean rate (Mbs/sec) 
Burst length (ms) 
Mean rate/Peak rate 
SCV(xl02) 



1 2 3 



10 


5 


2.5 


1 


1 


1 


13 


26 


52 


0.1 


0.2 


0.4 


5.96 


3.92 


2.20 



4 


5 


6 


2 


5 


2.5 


1 


0.5 


0.2 


65 


26 


26 


0.5 


0.1 


0.8 


1.53 


5.96 


2.59 



SCV stands for Squared Coefficient of Variation of the interarrival time. 



■ If > 0, it means that connection u have lost more cells than its 
fair share. Again the selecting rule of min {5^} allows the remain- 
ing active connections to equally benefit from it. from it. 

■ The value of Bu only represents a relative measure. Bu > 0 does 
not mean that u’s CLR exceeds its requirement during its life time, 
neither does Bu < 0 mean that remaining active connections’ CLRs 
are higher than their requirements. 

3. PERFORMANCE OF FCD 

The performance of FCD have been carefully evaluated and validated 
via extensive simulations. In this cestion, we describe the simulation 
methodology and discuss the results. 

3.1 SIMULATION MODEL 

In our study, six types of On-Off sources are used and their charac- 
teristics are described in Table 1.2. The simulation runs were indepen- 
dently replicated 20 times, and each run included the transmission of 
2 X 10^ cells. Confidence intervals are calculated using the Student — t 
distribution with 98% confidence. In order to perform the simulations 
with sufficient statistical quality and also within a reasonable time, we 
chose the values of desired cell loss ratio 10“"^, 2 x and 5 X 10“^. 
Although 10“^ cell loss ratio is higher than that of some real applica- 
tions, it does not preclude us from demonstrating the effectiveness of the 
method. This can be justified by Theorems I and II and was verified by 
experiments not shown here. 

FCD’s loss fairness is measured by the following error function. Let 
LSx be the number of cell losses of connection x collected during the 
simulation life time, and K be total number of simulation runs, we then 
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Simulation Results with 10 Connections: DUA + WRR 




Figure 1.1 Steady state - homogeneous CLR with DUA 



define an error function called Fair Error (FE) to measure the fairness 
performance: 






FE 






LS, 



E ly 

y=i 



LSv 



K{N - 1) 



In the following sections we present the performance of FCD in two 
ways: (i) via monitoring its transient loss behaviours under various traf- 
fic and switch configurations; (ii) by showing long-term loss performance 
in terms of mean and confidence interval of CLR over the entire simu- 
lation run time. Confidence intervals are shown as vertical solid lines in 
the long-term figures. 



3.2 LOSS PERFORMANCE: FCD VS DUA 

We first compare the loss performance of FCD to that of DUA. The 
traffic configuration of simulation results reported in Figures 1.1 to 1.10 
are heterogeneous. For each simulation configuration, the aggregate traf- 
fic is a random combination of various traffic types given in Table 1.2. 
We evaluate the FCD algorithm using both homogeneous and hetero- 
geneous CLR requirements. The total active number of connections we 
used in our simulations varies from 10 to 20. However, due to the space 
limitation, we only report results of 10 active connections. We should 
note that no matter whether the configuration is homogeneous or het- 
erogeneous, from Property II, the resulting normalized CLR should be 
the same for all active connections if FCD is used. 

The results of Figures 1.1 and 1.2 show that although all active con- 
nections have the same traffic load (mean rate) and the same CLR re- 
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Transient Behaviour of Ceii Losses: DUA + WRR 
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Figure 1.2 Sample path - homogeneous CLR with DUA 



quirements, by using DUA, the burstier sources (i.e., VC 1 and VC 2) 
experience a higher CLR than the less bursty connections. While the re- 
sults of Figures 1.5 and 1.6 further demonstrate the fact that regardless 
of the CLR requirements, fot the DUA dropping policy, the actual CLR 
experienced by an individual connection is controlled by its burstiness 
and its peak rate relative to others. For instance, less bursty connections 
(i.e., VC 8, VC 9 and VC 10) with higher CLR requirements actually 
result in smaller CLRs than that of burstier ones (i.e., VC 1, VC 2, and 
VC 3) with lower CLR requirements. Also for the same burstiness char- 
acteristics (i.e. peak rate to mean rate ratio and squared coefficient of 
variation of the interarrival time of cells), if DUA is used as a dropping 
policy, the connections with higher peak rates (i.e. VC 1, VC 2 and VC 
3) experience higher CLRs than that with lower peak rates (i.e. VC 4, 
VC 5, VC 6 and VC 7). 

Comparing the simulation results of using FCD to these of using DUA 
(Figures 1.3, 1.4, 1.7, 1.8 versus Figures 1.1, 1.2, 1.5, 1.6 respectively), 
the effectiveness of FCD is obvious. We observe that FCD successfully 
breaks the ^doss dependency” on traffic burstiness and regulates the loss 
distribution across connections according to their individual loss require- 
ments. Here “loss dependency” is a relative term across connections 
rather than an absolute value of CLR. In order to understand and fully 
evaluate the performance of FCD both in steady state and its transient 
behaviour, we have constructed various configurations as shown in Table 
1.4. FCD yields very satisfactory results for all cases we investigate here. 

It is trivial to interpret the results where all connections have the 
same CLR requirements. We, thus, describe the simulation results 
of heterogeneous CLR requirements in greater detail. We generated 
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g Simulation Results with 10 Connections: Deterministic FCD + WRR 
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Figure 1.3 Steady state - homogeneous CLR with FCD 



Transient Behaviour of Cell Losses: Deterministic FCD + WRR 
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Figure I .4 Sample path - homogeneous CLR with FCD 

, Simulation Results with 1 0 Connections: DUA + WRR 
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Transient Behaviour of Ceil Losses: DUA + WRR 
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Figure 1.6 Sensitivity to heterogeneous CLR. requirements - sample path heterogene 
ous CLR with DUA 
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Figure 1.7 Steady state - heterogeneous CLR with FCD 



Figures 1.6 and 1.8 through normalizing the sample paths of simula- 
tions for using DUA and FCD respectively. By ‘‘normalizing” we mean 
SirnujaUonM^^^^^^^ II, ideaUy aU connections’ nor- 

malized CLRs should be the same. Figure 1.6 simply shows the very 
unsatisfactory results of DUA; while Figure 1.8 demonstrates the signif- 
icant effectiveness of FCD. 

3.3 SENSITIVITY TO DIFFERENT 
SCHEDULING ALGORITHMS 

As mentioned above, scheduling algorithms affect sub-queue length 
distribution whose dynamics in turn directly impact FCD’s short term 
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Transient Behaviour of Cell Losses; Deterministic FCD + WRR 
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Figure 1.8 Sensitivity to heterogeneous CLR requirements - sample path - heterogen 
eous CLR with FCD 



fairness preserved by For instance, the probability that one finds 

a sub-queue length to be zero varies with different service scheduling 
algorithms. However, in practice, the extreme case where a chosen con- 
nection’s sub-queue is always zero is very rare and should be avoided by 
any kind of scheduling algorithm. In our simulations we have used two 
kinds of service scheduling algorithms: Weighted Round Robin (WRR) 
and Equally Weighted Round Robin (ERR). Since our purpose here is 
to show differences induced by using diflferent service scheduling algo- 
rithms, how to assign weights in WRR is not our concern in this paper. 
However, we do believe in assigning weights based on traffic characteris- 
tics. Also we chose to assign weights that are dramatically different from 
ERR so as to investigate the dependency between FCD and the service 
discipline. For the cases where 10 active connections are multiplexed, 
the weights used in simulations with homogeneous CLR requirements 
are given in Table 1.3 as Policy I and that used in simulations with 
heterogeneous CLR requirements as Policy II. 

We have run simulations using both WRR and ERR. The results 
shown in Table 1.4 indicate that WRR always yields better performance 
regardless the FCD implementation - probabilistic or deterministic. Al- 
though the weights used in the service scheduling are significantly differ- 
ent between WRR and ERR, the resulting loss performance differences 
in steady state while FCD being used are negligible comparing to that of 
FCD vs DUA. We thus conclude that in a long run FCD is quite robust 
to the service scheduling. 
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Transient Behaviour of Cell Losses: Deterministic FCD + WRR 
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Table 1.3 Weight Assignment for WRR Configurations 





PoHcy I for WRR 
Traffic Weights 


Policy II for WRR 
Traffic Weights 


ERR 

Weights 


VC 1 


Type 1 


71 


Type 1 


42 


1 


VC 2 


Type 1 


71 


Type 1 


42 


1 


VC 3 


Type 2 


31 


Type 1 


42 


1 


VC 4 


Type 2 


31 


Type 5 


5 


1 


VC 5 


Type 2 


31 


Type 5 


5 


1 


VC 6 


Type 2 


31 


Type 5 


5 


1 


VC 7 


Type 3 


12 


Type 5 


5 


1 


VC 8 


Type 3 


12 


Type 6 


1 


1 


VC 9 


Type 3 


12 


Type 6 


1 


1 


VC 10 


Type 4 


3 


Type 6 


1 


1 



Table I .4 Various Configurations and Their Corresponding Fairness - Fair Error 



Discarding 

Policy 


Implement ation 
Option 


Service 

Scheduling 


10 

Connections 


20 

Connections 


DUA 




WRR 


0.6371601 


0.6508583 


FCD 


Probabilistic 


ERR 


0.04022239 


0.05977245 


FCD 


Deterministic 


ERR 


0.03158206 


0.05036296 


FCD 


Probabilistic 


WRR 


0.02873973 


0.05061843 


FCD 


Deterministic 


WRR 


0.01570717 


0.03973919 



CLR requirement is 10“'^. For 10 connection cases, combination of source traffic is: 2 Type 1, 3 
Type 2, 4 Type 3 and 1 Type 4; link capacity = 28.72 (Mbs/sec) and buffer size S = 306 (cells). 
For 20 connection cases, combination of source traffic is: 8 Type 1, 6 Type 2, 3 Type 3 and 3 Type 
4; link capacity = 52.10 (Mbs/sec) and buffer size B = 306 (cells). 



3.4 DETERMINISTIC IMPLEMENTATION 

VS PROBABILISTIC IMPLEMENTATION 

We propose two ways of implementing FCD. The simulation results 
of Table 1.4 suggest that deterministic implementation is a better can- 
didate. Also we note that among all configurations which we have con- 
structed, the best performance comes from a combination of determin- 
istic implementation and WRR service scheduling. 
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Transient Behaviour of Cell Losses: Deterministic FCD + WRR 
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Figure 1.11 Sensitivity to load variation - Sample path of simulations 



Transient Behaviour of Cell Losses: Deterministic FCD + WRR 
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Figure 1.12 Sensitivity to load variation - Sample path of simulations 



3.5 SENSITIVITY TO LOAD VARIATION 

One of the most important aspects of any traffic control mechanisms 
is how to handle the traffic variation and what is the implication of its 
certain reactions toward traffic variation. Since we only use the mean 
arrival rate as the traffic input parameter in our FCD, there is no need 
for us to worry about the burstiness variation of the traffic. As we 
discussed in the previous sections, FCD is resilient to burstiness of the 
traffic. 

Regarding the average load variation, one may face the situation where 
fairness related issues are heavily involved. In our study we argue that 
if a connection is submitting traffic at a lower rate than what it declared 
as long as the resulting CLRs of all connections are lower than that 
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Transient Behaviour of Cell Losses: Deterministic FCD + WRR 
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Figure 1.13 Sensitivity to buffer size - Sample path of simulations 



of submitting traffic at declared rate, we state that it is fair under the 
condition that Property II is true. 

Figures 1.12 and 1.11 present the simulation results when some con- 
nections (i.e., VC 1 and VC 2) are submitting traffic at lower mean rate 
than their declared rates. Figure 1.12 shows us exactly what Property 
III predicted - (a) all connections’ CLRs become lower; (b) VC 1 and 
VC 2 have higher relative CLRs than that of VC 10. On the other hand 
Figure 1.11 indicates that all connections’ actual cell loss rate in terms of 
“ceUs/sec” is decreased at the same proportion given by ^ for P* < P. 
Note that P stands for the aggregate CLR of connections where all con- 
nections are submitting traffic at their declared rates while P* stands 
for the aggregate CLR of connections where some of the connections are 
submitting traffic at lower rates than what they declared. 

3.6 SENSITIVITY TO BANDWIDTH AND 
BUFFER ALLOCATION 

Propertys I and II state that FCD does not control the aggregate loss 
performance and it assures proper loss distribution across connections 
according to individuals’ CLR requirements. 

We further verify Propertys I and II via varying allocated bandwidth 
in our simulations. Figures 1.9 and 1.10 present several simulation sam- 
ple paths of homogeneous and heterogeneous CLR requirements respec- 
tively. It is clearly shown that CLRs ofthe all active connections increase 
due to the decrease in allocated bandwidth and that they all increase at 
the same proportion. 
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Transient Behaviour of Cell Losses: Deterministic FCD + WRR 
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Figure I.I4 Sensitivity to heterogeneous CLR requirements and buffer size - sample 
path of simulations 



FCD reacts toward decreasing/increasing the buffer size in a simi- 
lar way as that of decreasing/increasing the bandwidth. Corresponding 
simulation results are given in Figures 1.13 and 1.14. 

3.7 SENSITIVITY TO CONNECTION LEVEL 
DYNAMICS 

An important issue for any traffic control scheme is how it reacts to 
the traffic dynamic. In above simulation studies, we evaluated FCD’s 
sensitivity to traffic dynamic by generating bursty traffic and varying 
traffic load. In this section we vahdate FCD’s robustness in response 
to connection-level dynamics in the following two forms: (1) impact on 
the remaining connections when some of the connections are terminated 
with non-zero values of B (Bargain Vector); (2) impact on existing con- 
nections when some new connections join in. 

In our simulations, for the sake of simplicity, we do not re-calculate 
the required bandwidth when there are changes in terms of connection 
leaving or joining. Thus for the same buffer space and bandwidth the 
resulting CLRs of the remaining ones become lower after VC 1 leaves. 
Figure 1.15 compares CLRs of all connections in steady state for two 
cases: 10 connections are started and terminated together; and VC 1 is 
terminated earlier at 500 seconds of simulation time. Two parallel curves 
imply that FCD adapts very well to the connection level dynamics. This 
also validate our reasoning on B’s robustness in Section 2.3. 

Figures 1.16 and 1.17 describe the transient behaviours of some sam- 
ple connections when connection VC 1 either leaves earlier or joins later 
during the simulation. We note that there are some jumps around 500 
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Simulation Results with 10 Connections: Deterministic FCD + ERR 
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Figure 1.15 Sensitivity to connection level dynamics - steady state 
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Figure 1.16 Sensitivity to connection level dynamics - Sample path of simulations 



472 






Transient Behaviour of Cell Losses: Deterministic FCD + ERR 
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Figure 1.17 Sensitivity to connection level dynamics - Sample path of simulations with 
re-initialization of measurements at 500 seconds 
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Figure 1.18 Sensitivity to connection level dynamics - Sample path of simulations 
without re-initialization of measurements at 500 seconds 
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seconds in Figure 1.17. The reason is that we reset the simulation mea- 
surements when VC 1 starts being active around 500 seconds of simu- 
lation time. However all connections soon converge very closely after a 
very short period. Another way to present the same results of Figure 
1.17 is to show the accumulated CLRs of VC 2 and VC 10 without re- 
initialization of measurements. As a result, we observe smoother curves 
in Figure 1.18. While for the case that VC 1 leaves earlier, there is 
no need to reset the measurements since all we are concerned about is 
whether the relative cell loss distribution across remaining active con- 
nections is kept the same. Regardless of how to present the results, the 
important message we obtain is that FCD can adapt very well to traffic 
dynamics at connection level and preserve the loss fairness given by FRL 
of formula (1.1). 

4. CONCLUSIONS 

Finally we conclude our study and highlight our findings: 

■ Simulations results confirm the theoretical findings that FCD pro- 
vides a robust mechanism to allocate cell loss to diverse connec- 
tions in a fair manner. 

■ FCD does not require prior knowledge of detailed traffic charac- 
teristics and only mean rate is needed, which can be declared by 
connections at setup time or estimated during the connection life 
time. 

■ FCD is a solution for heterogeneous environment in terms of both 
traffic characteristics and QoS requirements. 

■ In the long run (i.e. accumulated number of losses exceed 100 
cells), FCD is insensitive to which service scheduling algorithm is 
being used and how it is implemented in terms of static vs proba- 
bilistic. 

■ Scheduling has a direct impact on short term performance. 

■ The FCD algorithm is very simple and can be implemented without 
incurring too much overhead. It is only active when total buffer 
occupancy reaches its limit. 

■ Deterministic implementation of FCD outperform a probabilistic 
one in terms of resulting values of FE, i.e. FE{det.) < FE{prob.). 

■ FCD is a robust scheme in the sense that it can co-exist with var- 
ious resource allocation and scheduling schemes, which provides 
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US higher flexibility to deal with complicated ATM traffic manage- 
ment issues and allows us to decouple the methods for addressing 
those issues. 

Appendix: Mathematical Proofs 
Proof of Property II 

Since for given buffer size and traffic characteristics, if allocated band- 
widths are Ci > C 2 , the resulting aggregate CLR P{Ci) < P(C 2 ), thus 
for u G {1, N} 

p^jCr) „ F(Cr)Lt^^ P{C^) 

Pu{C2) ~ P{C2)Lt" nCi) ’ 

with equality when = 0. 

□ 

Proof of Property III 

If a subset of connections VCU submits traffic at a lower mean rate, 
A*, than their declared rate, A^;, i.e. A* < A^, the resulting total mean 
arrival rate is A* < A and the corresponding aggregate cell loss ratio is 
P* < P. We thus have 



a* = a-^;]a, + ^a: 

V V 

, _ \*P*XyPy 

“ A*AP 



(1.A.2) 

(1.A.3) 



■ Let V be a particular connections of subset V, for connection V , 
what we want to prove is py < py or 



A*Ay P* 

A^T 



< 1 



(1.A.4) 



Since P* < P, as long as the following inequality (1.A.5) is true 



^<1 or \*yX-\vy<\ (1.A.5) 



the inequality of (1.A.4) will be true. From (1.A.2) we have 



A* — A — ^ A^; + ^ A* — Ay -1- Ay 

vizV vi^V 



475 




and it is trivial to prove the following inequality 

- AyA* < A^A - Ay(A* + (A, - A;)) 

= —(Ay — Ay)(A - Ay) = —(A — Ay)^ < 1 

For Mw ^ V, we have 

A*P*A^p^ A*P*p^ AP*A^p^ P*p, 



Pu 



X^XP 



< 



XP X^XP 

From (1.A.3) and (1.A.6), we have 

Pi pI A*P* A*P*A^ A*P* 



Pw Pv 



\p 



APA: 



XP 



< Pu, (1.A.6) 



(l-^)<0 (1.A.7) 



Thus 



Pw ^ Pv 
Pw Pv 



□ 

Property IV can be proved similarly. 
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PART EIGHT 



Tools and Techniques 




Efficient Computation of Waiting Time Moments for the 
DBMAP/G/l/AT Queue* 

Christoph Herrmann 
Philips GmbH Research Laboratories 
Weisshausstr. 2, P.O. Box 500145, 52085 Aachen 
e-mail: herrmann@pfa.research.philips.com 

In the recent past, discrete- time queueing models of the DBMAP/G/l/A-type 
(D[B]MAP: Discrete [Batch] Mcirkovian Arrival Process) were developed, which al- 
low for computation of per-stream loss probabilities, one type of solution based on 
the method of the unfinished work, and the other using the M/G/ 1-paradigm. The 
method of the unfinished work (applicable for deterministic service time only) provides 
- besides formulae for computing per-stream loss probabilities - also expressions for 
obtaining per-stream waiting- time probability functions, and therefore arbitrary mo- 
ments of the waiting- time of each stream [7]. What is missing up to now (see also [3]), 
is an algorithmic recipe to compute waiting-time moments for DBMAP/G/l/A queue- 
ing system. This paper derives the ^-transform of the actual waiting- time and derives 
a fast algorithm to determine the waiting-time probability function for an arbitrary 
renewal service time. The framework presented allows dealing with a superposition 
DMAP+DBMAPas an input process. For both streams in the superposition, per-stream 
waiting-time probabihty functions are presented for the case of a deterministic service 
time, which is relevant for modelling of ATM networking (both wired and wireless). 
Together with findings for the continuous- time case ([2, 12]), this paper completes 
the insights into finite queueing systems of the M/G/ 1- type. In comparison with the 
method of the unfinished work, the M/G/ 1-paradigm provides much faster algorithms 
to compute loss probabihties and waiting- time moments, due to the smaller system 
matrix. Note that the D(B)MAP has proved a versatile stochastic process, which can 
also be tuned to represent periodic correlation functions [8] and not only geometrically 
decaying ones. 

Keywords DMAP, DBMAP, MAP, BMAP, discrete-time queue, finite buffer, waiting- 
time probability function, ^-transform of the waiting-time probability function. 

1. Introduction 

So far there is no expression for the ^-transform of the waiting-time probabil- 
ity function in discrete-time M/G/ 1-type queueing systems. [2] presents the so- 
lution for the MAP/G/l/A-case, which is continuous-time. The essential trick 
there is a decomposition of the series [I 5 + (C -h zD)]~^ into a sum of coefficients 
in 2 :. However, this is not applicable to the general case of the BMAP/G/l/AT 

*The author’s daily work on wireless ATM networks is supported by: The Federal Ministry 
of Education, Science, Research, and Technology, Germany 




queue, where this sum becomes + neither is it helpful for ob- 

taining the z-transform of the waiting-time of the DBMAP/G/1/ iV-queue^ . This 
paper provides an algorithm for both computing the ^-transform of the (actual) 
waiting-time of the above queueing system and for obtaining the waiting-time’s 
probability function. The results are so general that also the superposition 
of two input streams DMAP+DBMAP can be considered, the superposition of 
which is again a DBMAP. For this superposition per-stream expressions for 
the waiting-time probabilities are presented in case of a deterministic service 
time. Together with [7], which is treating per-stream loss probabilities in the 
DBMAP/G/l/iV-queue, now the full analysis of such discrete time queues is 
available. With respect to performance evaluation of wireless ATM networking, 
waiting-time moments are of more interest since they determine the expected 
delay, and delay jitter, when loss probability is sufficiently reduced due to For- 
ward Error Correction. 

2. The Discrete Batch Markovian Arrival Process (DBMAP) 

We recall the definition of the DBMAP (Discrete time analogue of the Batch 
Markovian Arrival process due to Lucantoni [11]) given in [4]. Consider an 
m— state Markov chain (MC). A transition from state S{t) = i to state S{t -\- 
l) = ^ = 0, 1, . . ., happens according to the transition probabilities [D^y]^.^. 

(D^yi transition matrix, [ • ]ij: entry in the i— th row and th column) w = 
0, 1, 2, . . ., bmax, Dty := 0 for u; > bmax^ and a transition is interpreted as an 
arrival of a ii?-batch (batch of size u;), if > 0 and there is no arrival if tt; = 0. 
Thus the model generates interarrival times with batch arrivals possible and 
successive interarrival times are not independent. In order to distinguish from 
other state random variables, the m states of the DBMAP are usually referred 
to as phases. The MC has different types of transitions between two phases i 
and j, thus extending the conventional definition of a MC. 13 := Yl^z=o is 
a stochastic matrix and represents the (conventional) MC which only considers 
the transitions between two phases no matter what type of transition. Let tt 
denote the stationary phase probability vector ttD = n. It can be proved that 
the point process of the arrival instants is Semi-Markovian with the following 
SM kernel (with denoting the discrete-time instant of the n— th arrival) 

P{5(T„+i) = j,T„+i -T„ = k\S{T„) = i} = [Dg-i • (D - Do)] 

The embedded MC has the transition matrix (I — Dq)~^(D — Do). With P = 
7 t(D — Do)/[7t(I — Do)^ denoting its stationary phase probability vector at 

^ [l] derives the Laplace transform of the waiting-time distribution function, which however 
leaves open the decomposition of [Is -t- T>kZ^]~^, which corresponds to [R(2:) -f- sl]~^ in 
the N-process notation used there, and confines to the special case of the Markov Modulated 
Poisson Process, for which this decomposition can be derived. Note that the differentiation 
approach proposed here can also be applied to the decomposition of [Is + 

Thus, the general expression for the Laplace transform of the waiting- time distribution func- 
tion of the BMAP/G/l/AT is now possible. 
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arrival instants^, we get for the mean value of the time between successive 
arrival instants^ E[^] = ^E)o''^(D — Do)e = P(I — Do)“^e. e denotes 

the column vector of I’s, e = (1,1,..., 1)^, m components. Note that E[A]~^ ^ 
A := ZlX]^=i where A is the arrival rate. In the special case of the DMAP 

[bmax = 1) it is the custom to write: C := Do, and D := Di. 

2.1. The counting process of a DBMAP 

Let N{0^t] denote the number of arrivals in the intervall (0,^] and define 

[P{w,to)]ij:=P{N{0;to] = ^. S{to) = j \ N{0;0] = 0, S{0) = i} . 

The following relations can be proved^ (see e.g. [7]): 

P(u;, 0) = l«;=o - I , I : (m X m)— matrix of unity . 

w 

P(u),< + 1) = ^P(u,0D«,_v , (1) 

v=0 



iu=:0 



rbrr 






D„ 



=: D(z) 



for alH € IN 



( 2 ) 



In [8], DMAPs are parameterized in such a way that periodic correlation func- 
tions for the count process or the interarrival times are obtained. The key idea 
is using a periodic phase transition matrix for the DMAP. 



3. The analysis of the queueing system 

We consider a FIFO queue with a buffer of N places (total capacity: N 
The interarrival times are produced according to a DBMAP, the service times 
are i.i.d. (renewal) with the probability function h{t). Due to the properties 
of the DBMAP, the occupancy at successive time instants immediately after a 
departure results in an embedded Markov Renewal process. (It is an embedded 
MRP, since the state the MRP rests between two departures is not equal to 
the state of the queueing system in terms of the occupancy between departure 
epochs, since in between arrivals can happen.) 

The resulting Semi-Markov kernel q(f) is of M/G/1 type [12]. By q := 
5(0 denote the embedded Markov chain (MC) of the MRP. The sta- 
tionary state probabilities of this MC are needed for computing the probability 
function of the occupancy at arbitrary and arrival instants, as well as for the 
probability function of the waiting-time. Furthermore the MRP characterizes 
the output process in terms of the interdeparture times completely. The com- 
putation of the joint probability functions of successive interdeparture times is 
done by using standard techniques of Semi-Markov processes [5, 6]. 

e = 1 and P(I — Do)“^(D — Do) = P are easily verified. 

^An = Tn+i — Tn denoting the n-th interarrival time, A = {An,n G No} the stochastic 
process of the interarrival times 

^Iexpression = 1, if EXPRESSION is true, and = 0 else. 
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3.1. The embedded Markov Renewal process (MRP) at departure 
instants 

The Semi-Markov kernel q(^) of the MRP states about the probability 
[ ^ that, given a departure leaves k customers in the system with 

the input process in phase i, the next departure happens after t time units and 
leaves £ customers in the system with the input process in phase j. Here, ’’leave 
behind” means ’’immediately after a departure” so that an arrival simultaneous 
with a departure is taken into account. 

3.2. Arrival First Policy 

Due to the fact that arrivals and departures can happen simultaneously with 
positive probability in discrete-time queueing systems, one has to define, how a 
simultaneous arrival and departure are dealt with: If the arriving customer still 
sees the departing one in case of simultaneity, this is called the ’’arrival first” 
(AF) policy, which is considered here. The waiting-time probability function of 
the other case ’’departure first” (i.e. the departing customer leaves before the 
arriving one enters in case of simultaneity) is for further study. 

The case AF leads to a Semi-Markov kernel q(t) similar to that of the 
continuous-time case [7]. It is described by the following matrices: 

• Ak{t), where [Ak{t)]ij denotes the probability, that given a departure, which 
leaves at least one customer in the system with the input process in phase i, 
there are k arrivals within the following service time lasting t time units with 
the input process in phase j at the next departure. In the present case it is: 
Ak{t) = V{k,t)h{t). 

• where denotes the conditional probability, that given a de- 

parture, which leaves the system empty with the input process in phase i, the 
next departure happens after t time units with the input process in phase j, and 
there were k arrivals during the service time of that departure. In the present 
case it is 

oo t 

w=l V=1 



oo t 

W=1 V=1 

With AF a departure can leave at most N customers in the system (as it is in 
the continuous-time case) so that the state space of the MRP is {0, 1, . . . , A} x 
{1,2, . . .m}. The resulting Semi-Markov kernel is shown in Fig. 1. The tran- 
sition probability matrix of the embedded MC q is composed of the entries 
B/. := ^k{t) and Ak := ^k{i) among which there is the following 

relation: 

means the submatrix in the k-th main row and the ^-th column of q(t) and in 
this submatrix the entry in the i-th row and the j-th column. 
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^Ak{t) 






k=l 



Fig. 1 Semi-Markov kernel for arrival first. 



oo CO CO k 

w = l v=il t = l w = 0 

4. The z-transform of the waiting-time 

The derivation bases on similar arguments as used for obtaining expressions 
for the occupancy at arrival instants (for details see e.g. [10, 7, 12]). 

Consider Fig. 2. At time instant u there is a departure, which leaves the 
system empty; v time units later, there is a (/i-batch) arrival (i.e. jj, customers 
arrive simultaneously, which is incorporated in the input process DBMAP). The 
very customer of the batch arrival, who is admitted first, will stay in the server 
until his service time is elapsed; in Fig. 2 this service time lasts t — u — v-\-r time 
units. A further arrival at time instant t will see fi customers in the system. 
The time until these customers will have left the system equals the unfinished 
work seen at time instant f, and it is composed of the residual service time r of 
the customer in service and the sum of the service times of the other customers 
of the //-batch. 

A second case, where a departure does not leave the system empty, has to be 
discussed separately and completes the description of the system’s evolution in 
time (Fig 3). Note that as soon as a customer leaves a non-empty system, the 
service time of the next customer in service starts. 

4.1. Waiting time at finite time instants 

In the following r^^^ {u) means the probability that given a departure at time 
instant 0, which leaves k customers in the system with the input process in 
phase i, a departure at time instant u leaves i customers in the system with the 
input process in phase /. This probability can be expressed by means of the 
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Fig. 2 Time scheme in case that a departure leaves the system empty at u. x 
represents the unfinished work seen by the arrival at t. 



Semi- Markov kernel describing the queueing system by 

n=l 

For the following discussion, the explicit representation of r^g^ (^) := [[r(^)]/c^]^j/ 
is not of importance, since it will disappear in the limit case due to the limit 
theorem for Markov renewal processes. 

Define u)'^^[t^x^ p) as the conditional probability that given a departure at 0 
which leaves k customers in the system with the input process in phase f, a /x- 
batch arrival at i sees a waiting-time of x and an occupancy of r customers with 
the input process in phase j. Using r^g (u) we get (where jVi is a normalizing 
constant depending on the time instant t): 

^-1 

Vt Wfco(<, X, n) = U=o Yj Yj («) [p( 0 , t - 1 )D^] ^ ( 4 ) 

j'elEu=0 

For 1 < r < TV : 

Aftwl\{t,x,p) = 

denotes the n-fold convolution. 
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Fig. 3 Time scheme in case that a departure leaves a non-empty system at u. 
X represents the unfinished work seen by the arrival at t. 



t—2 t—u—\ r 






(w) [p(r — i,t — u — 1)D^1 ^ hr{t — u, 



j'eiE w=o i=i 



where hr{t^x) is given by: 

0 for r 0 

^ h{t x) for r = 1 

hAt,x):=l j2h{t + T)h*i’-^H^-r) forr>2 

, r=0 



Here, h*^{x) means the r-fold convolution of h{x). For r = N -\-l, the arrival is 
lost. Therefore, this case does not contribute to the waiting- time. 

4.2. Waiting time in the limit case 

The limit theorem of Markov Renewal functions (see e.g. [13], p. 125 If.) 
applied to eqs. (4) and (5) yields expressions no longer depending on i and k. 
Therefore they are dropped in the limit expression defined by 



[W^r > f^)]j ■= /4m wl^^(t,X,fi) , 



which denotes the probability that in the limit case an arriving /i-batch sees 
an unfinished work of x with the input process in phase j. Applying the limit 
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theorem yields 

A/'wola:,/^) = ^P^£o(I- (6) 
and for I < r < N 
Afwj.{x, fi) = 

I \ f \ 

= (£o(I-Do)“^D^+^| y^P(r-^,o-)D^Ar(o-+l,®) , (7) 

1=1 LV / cr = 0 

where E* is the mean sojourn time of the underlying Markov Renewal Process, 
which is here the mean interdeparture time of the queue. E* need not be 
computed due to the normalizing constant Af. 

4.3. Efficient computation of the ^:-transform 

Define the ^-transform of the waiting-time by 

oo 

x=0 

For hr{t, x) we get: 

hr{t,x) 0-9 Z~^ I hz{z) — z^h{v) 

\ u=Q 

if ^^-transformation is defined by f{k) o-% J2T=o /(^)^^- ^z{^) = J2T=o h{k)z^ is 

the 2 :-transform of the service time. 

Theorem 1 For AF: 

^£o(I- Do)"^D^ forr=0 

[(^o(I - Tio)~'^Tit + x^\i.T-t{zi)[hz{zi)]^~'^'D I^for \<r <n(^) 

i=i ^ 

0 for r = AT -h 1 

M is determined by a completeness condition, which can e.g. he defined by 
W_r{l, p) = 1, and then refers to the unfinished work seen at an arrival 
instant. It is 

n 

hn(zi) = h,(z,)n„(zi)-J2'^k{zi)An-k[h,{zi)r-^ . ( 9 ) 

k=Q 

where the 'Ttn{zi) are coefficients of the series 

oo 

^ = [Izi - D(^2 /iz(^i))]“' , (10) 

71=0 




^ [hz{z)Y ^ for r > 1 
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for which the following recursion holds 

"n-l 

^ 7^fe(2:i)Dn-A:[/i2(2:i)]^“^ (Izi - Dq)'^ for n>l , (11) 



= [1^1 - Do] ^ . (12) 

Proof The central idea is to apply - besides the ^-transform with respect to 
the time variable x - also a ^-transform with respect to the count of arrivals n\ 
Let 



h„/(zi) :::= y^P(n,(T)y] z(hn+l{<T+ 1, 



(7 = 0 37 = 0 



Then we have 



hn/(zi)4 = ^ ^ P(«> ^ 4'^n+l{(T +1,2;) = 



n=0 (7=0 37=0 



oo oo / ^ \ 

(7=0 n=0 \ 37=0 / 

(because of eq. (2)) 

oo / ^ \ 

= 53[zr'D(^2/i4^i))]"zrM^(zi)-534Ma;)U.(^i)'-' - m 

(7 = 0 V 37 = 0 / 

(developing into an infinite series yields) 

= [Izi-D(z2/*z(^i))]"' (i/*.(.^i)-Ea„(z2/*^(^i))")i»z(^i)'-' . (16) 



[Izi -D(z2/i.(zi))]-' =: 5^7?.„(zi)z” 



we get: 



£h„.,(zi)z2" = 



53 nn(zi)z^ lh,{zi) - 53 A„[z2h4zi)r I K(zi) 



which yields (due to the Cauchy product formula) 



Azi)= 7^„(zl)/,,4zl)-5]7^fc(zl)A„_fc[/^4^l)^-'•• h,(ziY-^ (18) 
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and finally 



hn(^i) = T^n{zi)h,(zi) - . (19) 

k=0 

The expression for r = 0 follows directly from eq. (6). 

Determining 

What is left, is the determination of lZn{zi), which is the n-th coefficient of 
the series of [Izi — D( 2 : 2 )]“^- The trick of the DMAP-case in [2] does not work, 



therefore a differentiation is used for determining 7Zn{zi): 

oo 

= [Izi-D{z2h,(zi))r^ =:f{zuZ2) ( 20 ) 

n=0 

f(zi,Z 2 ) - [g{zi,Z 2 }]~^ , g( 2 i,« 2 ) := 1^1 - D(^ 2 /iz( 2 l)) 

nW„(0i) = |^f(zi,^2)| , 7^o(^l) = [Izi - Do]-' (21) 

OZ 2 122 = 0 

Leibnitz’s Rule 

f^[f{zi,Z 2 )g(zi,Z 2 )] = 0 = ( 22 ) 

yields 

fi"'(^^l,22) = - ^ (”)fi,^H2l,22)gir'''(^l-^2) g-'(2l,Z2) , (24) 

.k=0 ^ ' 

which results in 



'n — 1 ✓ \ 

n!7?:„(zi)=fi”)(zi,0) = - y^(”)fW(^i,0)gir'n2i.O) g-'(^i,0) (25) 

U=o ^ ' 

and because of “ Do]~^, 

4"Vi,0) = kl-Rkizi), and gir'^(^i>0) = ' D„_4/*,(zi )]""'' 



[ 12:1 — Do 



for n = 0 , 



^n(^i) - < ^7^,(^l)D„_,[/l,(zl)f-'' (Di - Do)-' 



Applying eq. (19) to eq. (7) yields the expressions for 1 < r < A. 



490 




4.4. Determining the probability function of the unfinished work at 
arrival instants 

The above theorem allows the computation of arbitrary moments of the un- 
finished work at arrival instants. However, for capturing the waiting time seen 
by a test-cell of stream 1 or 2 in a superposition, it is necessary to find the prob- 
ability function. This can be done by “simply” differentiating the 2 :-transform 
often enough and considering the value of the differentiated function at 0. The 
problem, however, is that you need the derivatives up to the N D-ih one. The 
following Lemma establishes a simple algorithm to determine the value at 0 of 
all these derivatives. 

Lemma 1 Given the z-transform of the service time hz(z) := 

The probability function of the unfinished work is obtained from 

Define . 

1 d^ 

5R[m][n] , 

ml az^ z=o 

1 d^ 1 
0H[m][n] := — -- — h„(z) 

^ ^ m\dz^ ^ d2=o 

Then it is for 0 < n < N — 1, 0 <m < ND: 

1 Jm ^ 

[h„{z)[h,{z)Y-^]^^^ = .a/»[m- fc][£- 1] 

k=0 

with 

aH[m][n] = 

m m n 

n] • h{m — k) — J2J2mm-dh[m-k][n-£\-A„^^ , 

k=0 k=0l=Q 

^R[m][n] = 

'n — 1 m 

= ^ ^R[Ar][z/] • dh[m — k][n — u]- T>n-u — ^R[m — l][n] 

k=0 

oo 

dh[m][n] = ^^h{k) ' dh[m — k][n — 1] 
k=0 

Initial values: 

dh[m][Q] = lm=o, 5/i[0][n] = l«=o, aR[0][n] = l,=o(-Do)"\ 

aRH[0] = (-Do)-('"+'\ an[0][n] = In^oDo^Ao . 

Proof Apply Leibnitz’ rule to eqs. (9), (11). □ 
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4.5. Numerical stability 

This algorithm proves to be numerically stable, if Do is a diagonal matrix. If 
not, the high powers of the inverse of Do (which has entries significantly greater 
than 1, since Do is sub-stochastic) soon yield exploding matrix entries in 
and ^H, which then cause wrong values of the probability function. Eq. (8) can 
also be used for obtaining moments of the unfinished work at arrival instants 
by simply considering derivatives at z = 1, then also for non-diagonal matrices 
Do, with very good numerical results. However, then it is not possible to deal 
with per-stream values a,s discussed in the next section. 



5. Per-stream waiting-time probability functions 

Performance evaluation of ATM networks concentrates on deterministic ser- 
vice times due to the transmission of fixed-length packets in ATM networks. 
We now focuss on a multiplexer facing two input streams 1 and 2, stream 1 
represented by a DMAP\ , and stream 2 by a DMAP 2 , both being special cases 
of a DBMAP. 

Let DBMAPi be given by w G INq, D<1) ;= and DMAP 2 

by and The resulting superposition, again a DBMAP, is given by 

D„,,wG1No, E“=oD«; = D; these matrices are obtained by means of the 
Kronecker product (g) according to 

® C(2) for w = 0 

0 0 for 1 < w < b^lx 

® for w = b^lr + 1 • 

Here, we have for stream 1 due to the DMAP^^^ 

(1) _ f w > 0 . 

\ to = 0 ■ ^ ^ 

Now the matrices D^^, determine the Semi-Markov kernel governed by the matri- 
ces An, and Bn- The ^-transform of the waiting-time of each stream (i = 1, 2, 
and 3 representing the full stream) is simply obtained by: 








i=i 



T£o(I-Do)-1dW for 

Do)~’^Dr +x^)h,._r(zi)[h4zi)]^~^D<;>]for 
0 for 



r 

1 

r 



= 0 

<r<N (28) 
= N + l 



where 

D^f^^ = 0 -f 0 D^^^ , i.e. the arriving batch contains at 

least one cell of stream 1, 
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^ arriving batch contains a cell of stream 2, 

D)i = T>uj 5 i*e the arriving batch contains a cell of stream 1 or 2. 

The waiting-time of a cell, which is not lost, is given by the unfinished work 
immediately before the arrival instant of that cell plus the length of the cells of 
the same batch, which are admitted before that cell. As a lost cell cannot see 
any waiting-time, the waiting-time takes values in {0, 1, . . .^ND}. Since the 
waiting-time p.f. is related to u{k,i), 0 ^ ^ ^ ATT), normalizing constants 

Ai, A/ 2 j As sire needed. Note that, as usual test-cell admission for stream 2 is 
completely random. With 

ji) o-» fi), i— 1, 2, 3 

it is 

^ND ^max + l -j N 

P{Wi = i] = -^ Y, 

* /i = l ^ xl)=.l r=0 

Mi given by Y!i 3> PWi = ^} = 1- 

5.1. Formula validation 

The above formulae were implemented in C programmes. For diagnonal ma- 
trices Do the results of the per-stream probability functions obtained were in 
very good correspondence with those obtained from the method of the unfin- 
ished work described in [7, 10]: For double numbers the values of the probability 
function differed first in the 10th figure (of 16). Using derivatives of eq. (8) 
at z = 1, the results were as good also for non-diagonal matrices. Compared 
with the method of the unfinished work the algorithm presented above is much 
faster (and is applicable also for non-deterministic service time distributions), 
since the system matrix grows only with the second power oi N m in contrast 
io N 'm- D for the method of the unfinished work. The latter, however, allows 
to compute transient loss probabilities and waiting-time probability functions 
[9]. Table 1 compares the potential of both methods. 

6. Numerical Examples 

For the numerical examples, the per-stream mean waiting-times and loss prob- 
abilities were computed keeping the traffic load of the full stream constant at 
0.8, and varying the ratio of the traffic load. Stream 1 is a DMAP with two 
phases C = ^ — (oT-gD * P transition matrix of the embedded 

Markov chain, {bmax = 1, coefficient of correlation of its embedded Markov 
chain equals 0.6), stream 2 is a Bernoulli process. 

It is easily seen in Fig. 4 and 5 that as soon as the correlated stream 1 gets 
stronger, the mean waiting-times and loss probabilities of both streams in the 
superposition increase. Loss probabilities were computed according to [7]. 



493 




Table 1 

Comparison of the method of the unfinished work and the M/G/l-paradigm. 





Unfinished work [6, 9] 


M/G/1 paradigm 


System size 


{N 1) ’ D ’ m 
slower for D > 1 


(A -b 1) • m 


non- diagonal C 


no problem 


rounding errors 
for per-stream 
expressions 


Waiting Times 


X 


X 


Unfinished Work 


X 


X 


Loss Probabilities 


X 


X 


Conditional Loss Probability 


X 


- 


Transient Quantities 


X 


- 


Departure Process 


- 


X 


Continuous Time Analogue 


- 


X 



7. Conclusions 

This paper presents a fast algorithm for computing waiting-time moments 
for the DBMAP / G / 1 / queue with arrival first policy, and generalizes the al- 
gorithm in such a way that per-stream waiting-time probability functions of 
the DBMAPi-fDMAP 2 /G/l/A^-queue can be computed. Thus, it extends [7], 
where only the computation of per-stream loss probabilities was solved using 
the M/G/l-paradigm. Furthermore, it supplements [1], since the decomposi- 
tion of the matrix expression [Izi — T>{z 2 hz{zi))]~^ into a series in Z 2 , which 
is the crucial point in finding the algorithm, can also be applied to the com- 
position of the matrix expression [Is -f- ^ series of 2 :. The 

algorithm was validated by comparing the numerical results obtained by C pro- 
grammes with those, which result from the method of the unfinished work, 
which - for some input processes - can also compute waiting-time moments of 
the DBMAPi-f DMAP 2 /D/l/A^-queue [9]. Very good correspondence was found. 
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Abstract 

Closed-form solutions are characterised, based on the principle of maximum 
entropy (ME), for a single server censored G/G/l/Ni, . . . ,Nh queue with 
R {R > 1) priority classes under Head-of-Line (HOL) service discipline and 
Partial Buffer Sharing (PBS) scheme. The forms of the joint, aggregate and 
marginal state probabilities, as well as basic performance measures such as 
mean queue length and cell loss probability are analytically established at 
equilibrium via appropriate mean value constraints and the generating func- 
tion approach. Consequently, efficient recursive expressions of low compu- 
tational cost are determined. Furthermore, the G/G/l/Ni, . . . ,Nj^ queue is 
utilised, in conjunction with appropriate flow approximation formulae, as a 
cost-effective building block towards a queue-by-queue decomposition algo- 
rithm of arbitrary open queueing network models (QNMs) under repetitive 
service blocking with random destination (RS-RD). Typical numerical results 
are included to illustrate the credibility of the approximate algorithm against 
simulation in the context of Generalised Exponential (GE)-type buffered in- 
terconnection networks with bursty traffic. 
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1 INTRODUCTION 



Finite buffer queues and network models with service and space priorities are 
of great importance towards effective congestion control and quality of service 
(QoS) protection in Asynchronous Transfer Mode (ATM) networks. 

An ATM cell can be either of high or low priority depending on whether 
the cell loss priority (CLP) bit in the cell’s header has been set or not. A cell 
of high priority has by default its CLP bit set to zero. The CLP bit of low 
priority cells is set to one. It is the job of the priority mechanism to monitor 
the CLP bit of arriving cells and give preferential treatment to high priority 
cells. Priority mechanisms include time priorities and space priorities. 

Time priority mechanisms such as Head-of-Line (HOL) e.g., (Hong and 
Suda 1991) and Preemptive Resume (PR) e.g., (Kouvatsos and Tabet-Aouel 
1990), take into account that some services may tolerate longer delays than 
others (e.g., data versus voice) and deal with the order with which cells are 
transmitted. Although time priorities are not explicitly identified in ITU Rec- 
ommendation 1.361 c.f., (ITU Draft Recommendations 1990), they can be 
implicitly represented by using combinations of virtual path and channel iden- 
tifiers (VPI/VCI). 

Space prioritiy mechanisms control the allocation of buffer space to arriv- 
ing cells at an input or output port queue of an ATM switch. Implicitly, they 
provide several grades of service through the selective discarding of low pri- 
ority cells. This type of priority mechanism exploits the fact that certain cells 
generated by traffic sources are less important than others and may, therefore, 
be discarded without significantly affecting the QoS constraints. Space prior- 
ity mechanisms aim to decrease the cell loss probability and delays for high 
priority cells in comparison with low priority cells. Two main mechanisms for 
space priorities are push-out and partial buffer sharing (PBS). 

Under a push-out mechanism e.g., (Sumita and Ozawa 1988) , (Hebuterne 
and Gravey 1989), (Fourneau, Pekergin, and Taleb 1994), (Lin and Silvester 
1993), low priority cells finding on arrival the buffer of the queue full will be 
lost immediately. High priority cells arriving at a full buffer queue of capacity 
N will either replace (push-out) low priority cells from the queue (which are 
lost), if any, or are lost if only high priority cells are present (c.f.. Fig. 1). 
Note that a combination of time and push-out priorities have been studied by 
(Gravey and Hebuterne 1991) and (Nillson, Lai and Perros 1990). 
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N ^ 
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High priority cell pushes out low priority cell 
High priority cell ^ Low priority cell 

Fig. 1 : The push-out space priority mechanism 
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PBS works by setting a sequence of thresholds Ni, i = 1,2,. corre- 
sponding to R priority classes (indexed from 1 to in decreasing order of 
priority) of a single queue with finite capacity A^i. Highest priority cells of 
class i = 1 can join the queue simply if there is space. However, lower prior- 
ity cells of class j, j — 2, , R, can join the queue only if the total number 
of cells in the queue is less than the threshold value Nj. Once the number 
of cells waiting for service reaches Nj^ then all lower priority cells of class 
k, k = j 1,. .. ,R, will be lost on arrival but higher priority cells of class 
i ,2 = l,...,j — 1, will continue to join the queue until it reaches threshold 
value, i = 1, . . . , j — 1 (c.f., for R = 2 classes, see Fig. 2). Unlike push-out 
mechanism, once a cell of any class has been in place, it cannot be lost. Differ- 
ent cell loss and QoS requirements under various load conditions can be met by 
adjusting the threshold value. (Li 1989) and (Yin, Li and Stern 1990) applied 
PBS and selective packet discarding for the overload control of packet voice 
systems modelled by M/PH/l/N and PH/M/l/N queues. An M/M/G/l/N 
queueing system employing PBS scheme was studied by (Kroner 1989). 



Capacity, Ni 




Low priority cell 

Fig. 2: The partial buffer sharing space priority mechanism 



Some of the limitations associated with the analysis of priority mechanisms 
are due to exponential or deterministic arrivals which limit their usefulness 
whilst numerical and simulation methods for more general models may lead 
to time consuming (or even intractable) solutions as the system’s state-space 
increases. These solutions also fail to reveal explicit functional relationships 
amongst performance metrics. Furthermore, most of the analytic space (and 
time) priority models proposed in the literature focus on a single server queue. 

In this paper, the principle of maximum entropy (ME) (Jaynes 1957a) and 
the generating function approach (Williams and Bhandiwad 1976) are applied 
to characterise recursive expressions for state probabilities and basic perfor- 
mance measures for a general single server queue at equilibrium with finite 
capacity, R {R > 1) priority classes under Head-of-Line (HOL) service disci- 
pline and PBS scheme. Consequently, this queue is used, in conjunction with 
appropriate flow approximation formulae, as a cost-effective building block 
towards a queue-by-queue decomposition algorithm of arbitrary open queue- 
ing network models (QNMs) under repetitive service blocking with random 
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destination (RS-RD). Validation tests involving ME and simulation results 
are carried out in the context of simple space division ATM switch architec- 
tures represented by arbitrary Generalised Exponential (GE) - type multistage 
interconnection networks of switching components with single input/output 
port and output queueing. Note that under RS-RD blocking, when a cell upon 
service completion at queue i attempts to join a destination queue j whose 
capacity is full, it is rejected by queue j and immediately receives another ser- 
vice at queue i. Each time a cell completes service at queue z, a downstream 
queue is selected independently of the previously chosen destination queue j. 

The principle of ME provides a self-consistent method of inference for char- 
acterising an unknown but true probability distribution, subject to known 
(or known to exist) mean value constraints. In an information theoretic con- 
text (Jaynes 1957a), the ME solution corresponds to the maximum disorder 
of system states and thus is considered to be the least biased distribution 
estimate of all solutions that satisfy the system’s constraints. In sampling 
terms, (Jaynes 1957b) has shown that, given the imposed constraints, the ME 
solution can be experimentally realised in overwhelmingly more ways than any 
other distribution. Major discrepancies between the ME distribution and the 
experimentally observed distribution indicate that important physical con- 
straints have been overlooked. Conversely, experimental agreement with the 
ME solution represents evidence that the constraints of the system have been 
properly identified. Note that the methodology of ME has been applied un- 
der general conditions to analyse arbitrary QNMs with non-priority classes 
and finite capacity, e.g., (Kouvatsos and Denazis 1993). Details on entropy 
maximisation and systems modelling can be seen in (Kouvatsos 1994). 

For illustration purposes, it is assumed that the traffic entering and flowing 
in the network is bursty and is represented by a Compound Poisson Pro- 
cess (CPP) with geometrically distributed bulk sizes (Kouvatsos 1994). This 
particular process implies a GE interarrival-time distribution of the form 



F{t) = P{X < t) 

= l-re-^^^^ >0 



( 1 ) 



where r = 2/((7^-l-l), X is the inter-event time random variable and { 1/u, (7^ } 
are the mean and squred coefficient of variation (SCV) of the inter-event time 
distribution, respectively (c.f., Fig. 3). The choice of the GE distribution is 




Fig. 3: The GE distribution with parameters T and xv (0<T<1) 
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motivated by the fact that measurements of actual traffic or service times 
are generally limited and so only few parameters can be computed reliably. 
Typically, only the mean and variance can be relied upon, and thus, a choice 
of a distribution which imply least bias (i.e., introduction of arbitrary and, 
therefore, false assumptions) is that of GE-type distribution. In the context 
of ATM environment, this model is particularly applicable in cases of traffic 
with low level of correlation or where smoothing schemes are introduced at the 
adaptation level (e.g., for a stored video source) with the objective of minimis- 
ing or even eliminating the problem of traffic correlation (Ball, Hutchinson 
and Kouvatsos 1996). Moreover, under renewality assumptions, the GE dis- 
tribution is most appropriate to model simultaneous cell arrivals at output 
port queues generated by different bursty sources (e.g., voice or high reso- 
lution video) with known first two moments. In this context, the burstiness 
of the arrival process is characterised by the squared coefficient of variation 
(SCV) of the inter arrival- time or equivalently, the size of the incoming bulk 
(c.f., (1)). 

The form of the ME queue length distribution, subject to appropriate 
mean value constraints, for a single server censored queue at equilibrium with 
R{R > 1) HOL priority classes and PBS scheme is characterised in Section 
2 together with recursive relationships describing performance metrics, based 
on the generating function approach (Williams and Bhandiwad 1976). This 
ME solution is used in Section 3 as a building block, in conjunction with flow 
approximation formulae, towards the creation of a queue- by-queue decom- 
position algorithm of arbitrary open QNMs under RS-RD blocking. A case 
study, based on the analysis of a single server GE-type queue with R{R > 1) 
HOL priority classes and PBS scheme, is carried out in Section 4. Validation 
tests involving GE-type ME and simulation results are given in Section 5. 
Conclusions follow in Section 6. 

2 ME ANALYSIS OF G/G/l/Ni, . . . ,Ni^ QUEUE 

Consider an arbitrary single server queue at equilibrium with {R > 2) HOL 
priority classes and PBS scheme denoted by G/G/l/Ni, . . . ,N/^ such that (i) 
the interarrival times and service times per class are generally (G) distributed 
(ii) the total buffer capacity is Ni and (iii) the PBS scheme is specified by the 
sequence of thresholds {Ni^i = 2,3, ...,R}. Moreover, for each class i = 
1, 2, . . . ,R, the arrival process is assumed to be censored (i.e., a cell will be 
lost if on arrival it finds a full buffer) with mean rate Xi and interarrival time 
squared coefficient of variation (SCV), Cells are transmitted by a single 
server with mean rate /i^ and SCV, 

Suppose that the priority classes are indexed from 1 to R in decreasing 
order of priority. Let at any given time the state of the system be described 
by a vector 5 = where n^, z = 1, . . . , R, is the number 

of class i cells in the queue (waiting for or receiving service) and uj is the 
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variable indicating the class of the current cell in service (n.b., for an idle 
queue cj = 0). Let Q be the set of all feasible states {5} and P{S) be at any 
given time the equilibrium probability that the G/G/l/Ni, . . . ,N/? priority 
queue is in state S and tt^ be the cell loss probability that an arriving cell 
of class i, i = will find the queue occupied up to at least its 

corresponding threshold Ni, i = 1, 2, . . . , i?. 



2.1 Prior Information 

For each state S G Q, and class z, i = 1,2, the following auxiliary 
functions are defined: 

the number of class i cells present in state 5, 

J 1, if the cell in service is of class z, 

[ 0, otherwise, 

I 1, if 7ii > 0, 

[ 0, otherwise, 

I 1, if Ef=i ^>(5) = Ni, andsi(5) = 1, 

\ 0, otherwise. 

Suppose, now, all that is known about the state probabilities {P(5)} is that 



the following mean value constraints exist: 

(i) Normalisation, 

E = 1 (2) 

(ii) Server utilisation, 

E Si{S)P{S) = Ui,0 < Ui < = (3) 

seQ 

(iii) Busy state probability per class, 0 < < 1, 

J2hi{S)P{S)=ei, i = l,2,...,R (4) 

(iv) Mean queue length, (n^), 

E ni{S)P{S) = {n^ , Ui< {n^ <Ni,i = l,2,...,R (5) 

SeQ 

(v) Full buffer state probability, (/)j, 

E/i(5)F(5)=<Ai,O<0i<l, (6) 

SeQ 



Tli{S) = 
Si{S) = 

hi{S) = 

MS) = 
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satisfying the flow balance equations, namely 

Ai(l - TTi) = jLtiUi, i = (7) 

The choice of mean values (2) - (6) is based on the type of constraints used 
for the ME analysis of stable single class FCFS G/G/l/N queue (Kouvatsos 
1986). Note that if additional constraints are used, it is no longer feasible to 
capture a closed-form ME solution. Such solution is an essential building block 
in the context of a cost-effective queue-by-queue decomposition algorithm for 
arbitrary QNMs. Conversely, the removal of one or more constraints from 
the set (2)- (6) will result into a ME solution with reduced accuracy c.f., 
(Jaynes 1957a), (Jaynes 1957b). 



2.2 Maximum Entropy Solution 



The form of the state probability distribution, P(5),5 G Q, can be charac- 
terised by maximising the entropy functional H{P) = — P(5) logP(5), 

subject to constraints (2) - (6). By employing Lagrange’s method of unde- 
termined multipliers, the ME solution is expressed by 



PiS) = -^l[9i 



i(S)MS) miS) fi(S) 
Si *^2 Ui 1 



^SeQ, 



i=l 



(8) 



where Z is the normalising constant and {gi,^i,Xi,yi} are the Lagrangian 
coefficients corresponding to constraints (3) - (6), respectively. Furthermore, 
aggregating (8) over all feasible states 5 G Q, the joint ME queue length 
distribution is given by: 

■P(n) = E 9syP^-\ 0<rii<Ni, k'^Jii<Ni (9) 

i=l s=iAns>0 i=l 



where n = (ni , n- 2 , . . . , ni? ) and P(Q) = 1/Z. 



2.3 Estimation of the Lagrangian Coefficients 

The Lagrangian coefficients pi, ^i and Xi, i = 1, . . . , P, can be approximately 
determined in terms of input parameters via closed form asymptotic expres- 
sions based on the ME solution of the corresponding G/G/1 HOL priority 
queue at equilibrium c.f., (Kouvatsos and Tabet-Aouel 1990) and are given 
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by: 



ft 

{rii) - 9j _ Pi p-9j |-j p-6i 

{rii) ’ “ {I- p)6i- Pi . p- pC 

{6j - Pi) (1 - Xj) 

{p - 6i) Xi 



(10) 



where pi = Xi/pi, p = Ylf=iPi and constraints {(n*) are the associated 
performance metrics of the infinite capacity G/G/1 HOL queue (Kouvatsos 
and Tabet-Aouel 1990). 

Note that Lagrangian coefficients, yi^i = are determined via 

recursive exressions in Section 4.2 by making use of flow balance condition 
(7) and GE-type cell loss probabilities, tt^, i = 1, 2, . . . , i^. 



2.4 Recursive Relationships 

Let us define 



Qi = {5gQM(5) = 1, Vi = l,2,...,i?}, 
So = {5€gM(5) = 0,Vi = l,2,...,J7}, 

and Vi = 1, 2, . . . , -R, & = 0, 1, . . . , iVi 



= Is/Se Qi,0 < rij < Nj, j = 1, . . . ,R& < iVi V , 



i=i 



= < *S'/5 € Qi,0 < rij < v\rij < Nj, j = 1, . . . ,R& 

1 j=i 



The ME solution for each state 5, 5 G Qi, i = 1, 2, . . . , R is given by 



P{S) = -gi^ix^yr 



fi(S) 




7lj Jlj (S) 
Xj Ci 



,V5Eg, 



( 11 ) 
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Ni 

and since Ui = P{Qi) = J^seQ — U follows that 

V = 1 



— rvQi^i^i 




n 



hj(S) 



= 1 , 



,,R, (12) 



5gAV i^j=l 



where S{v) = l^ifv > iV^ — 1 or 0, otherwise. A more suitable formula, for 
computing the complex sums of eq. (12), can be determined by using the 
generating function approach as follows: 

Define 



Nj 






n,=0 



Pj{^) 

Gr{z) = ^Cn{v)z^ = Y[Pj{z) 



v =0 






n /"I - (1 - ' 



n( 

j=l ^ 



1 — XjZ 



(13) 



(14) 



It is observed that the coefficient of 2 :^ in Gr{z) is given by 

<^«(«) = E n 

seA- j=i 



where (0) = 



d^Gniz) 

dz^ 



\z=Q 



By enhancing this definition of z-transform, let us define 



Ni-l 



Pii^) = E 



ni=0 



1 - {XjZ)^' 

1 — XiZ ’ 



00 R 

GW(z) = Y.C^^Hv)z'’ = Pi{z) n PAA 

v=0 

where Pj{z) is given from eq. (13). Thus, eq. (16) becomes 



G<*>(^) 



1 - jXjZ)^' 1 - (1 - ^i)XjZ - ij{XjZ)^>+^ 

1 — XiZ 1 — XjZ 

l^j=l 



(16) 



(17) 



505 




(18) 



where is given by 

= Y, xr n 

iz^j=i 

Moreover, the coefficients v = 0,1,...,&2 == 1,2, can be com- 

puted via the following recursive formula 

+x^‘+^^iC^^Hv-Ni-l), (19) 

with initial conditions C^^Uv) = I ?’ ^ ^ n’ 

^ ^ [1, I’ = 0, 

where C{v) = Cr{v), and Cr(v) can be calculated recursively by 



Cr{v) = XrCr(v — 1) ~h Cr-l{v) ~ (1 “ 




T— 1 
1 

1 

7 

+ 

1 




r 0, 


u < 0, 


with initial conditions Cr{v) = < 1, 


u = 0, 




u > 0. 


Substituting (18) into (12) yields 





Ui = ^Qi^iXi ^ Y • 

From eq. (21), the following expression for Z is derived: 
R /iVi-l \ 

= 1 + ( Z I • 

i=l \ 11=0 / 



(20) 



( 21 ) 



(22) 



2.5 Aggregate State Probabilities 

Unconditioning expressions (21) over all classes, an expression for the ag- 
gregate utilisation, U, can be obtained with aggregate arrival rate A. The 
aggregate probability distributions, {P(v), u = 0, 1, . . . , Ai}, are given by 

P(0) = P(So)^^, (23) 

1 ^ 

= ^y^<7i^io;i2/f’^^C'(*)(t;-l),Vn = l,2,...,iVi. (24) 

i=l 
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2.6 Marginal State Probabilities 



The marginal state probabilities {Pi{k), A: = 0, 1, . . . , A^i} , z = 1, 2, . . . , i?, can 
be determined as follows: 

Let us define 






S/S &Q, 0<rij<NjJ = l,...,R,k ^ n,- < ATj I , 



i 5/5 e Q, 0 < nj < v-nj < Nj, j = 1, . . . ,R, k ^ = w > 

y iAj=i ) 

\/ j,i = 1,2,. .. ,R,i > j k V - 0,1,. .. ,Ni. Using the definition of P{So) and 
P(5), it follows that 



pm 



i^j=l 



v=0 



SeAy r^i=l 
r^j 



p^ik) = E +E^(-5) 



i:^j=l \S^Qj J SEQi 

m=k rii=k 

E ( E fwV E ^(s) 

i^j=i \ses'j^'^ 
rij>l 
rii=k 



ses] 

7ii = k 



7Vi 



p(So)+ E E pis)]=p{So)+ E E 

i^j=l\seQj J i#J=lVs€S'~i 

”•=“ Tlj>l 

ifi+ E E 5' n 1 1. (®) 



fl min{Nj —l,Ni—k — l) 

= E E 

v=0 






Ni-k 



+9i E yf'’^ E n 

V=0 r::^i=l J 



E y n 

(26) 



SeAy r^^i=l 



For a cost-effective computation of marginal probabilities, recursive formulae 
can be derived for the complex sums of expressions (25) and (26) as follows: 



507 




Removing the zth class in eq. (18), Cl^\v) is expressed by 

rij 



C^(«) = <( 



Es 6 > 4 - nA^i=i 



r^j 



ch.(S) 



E n , 

SeA]^ r^i=l 



3 = ^ 



where 



v=0 



(27) 



(28) 



The coefficients (v) can be determined via the following recursive formu- 
lae: 



- XiC^^^v - 1) -h (1 - ^i)xiC^^\v - 1) 

(v) — XiC^^^ (?; — 1) -h x^" (v — Ni), i = j 



(29) 



with initial conditions (v) 



0 , < 0 , 

1, u = 0. 

By substituting these recursive formulae (29) in (25) and (26), the following 
expressions for marginal probabilities are obtained: 



^’i(O) = I (l+ E ( E ) > 

\ V ^^=0 / / 

( R /min{Nj-l,Ni-k-l) 

Pi(k) = ^ Qj^jXj j 



(30) 






v=0 



R /Ni-k \\ 

E ( E ) I ,fc= l,...,A^i. 

i^j=l V v=0 / J 



(31) 



2.7 Marginal Mean Queue Lengths 

The marginal mean queue lengths (MQLs) {Li, 1,2,...,R} and associated 
Lagrangian coefficients {xi} satisfy the relationship Li = {xilZ)dZldxi, i — 
1,2, ... ,R (Jaynes 1957a). Defining appropriate z-transform and after some 
manipulations, it follows that 

ii = I £ 2/f (t;) j , i = l, 2,..., R (32) 
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where S{v) = l/ii v = Ni — 1 and 0 otherwise, and 

(«) = («-!) + (v), v = l,2,...,Ni-l, (33) 

X<i 

with initial condition: Cx^^(O) = C^^\\)!xi. C^^\v) values which can be cal- 
culated using (19). 

3 OPEN QUEUEING NETWORKS 

Consider an arbitrary open queueing network with M single server queueing 
stations, R distinct classes of cells, HOL scheduling discipline, PBS scheme 
with thresholds {Nki^ z = l,...,R, = and RS-RD blocking mech- 

anism. Let 

dkimj be the transition probability (first order Markov chain) that a class 
i cell transmitted from station k attempts to join station m as class j, 
^kimj be the effective transition probability, 

akio be the transition probability that a cell of class i leaves the network 
upon finishing transmission at station k, 

Aofci, ^be mean rate and SCV of the external interarrival process 

of class i cells into station A:, respectively, 

Afci, be the mean rate and SCV of the overall actual interarrival pro- 
cess of class i cells at station A:, respectively, 

Afci, be the mean rate and SCV of the overall effective interarrival 
process of class i cells at station A:, respectively, 

fJ'ki, be the mean rate and SCV of the actual service process of class 
i at station A:, respectively, 

(iki^ be the mean rate and SCV of the effective service process of class 
i cells at station /c, respectively, 

T^kimj be the blocking probability that a cell of class i upon its service 
completion (call it ’’completer”) from station k will be blocked by station 
m, as class j, 

TXki be the blocking probability that a completer from any station m ^ k 
of class i is blocked by station k 

TToki be the blocking probability that an external arrival of class i is blocked 
by station k 

TTcki be the blocking probability that a completer cell of class i will be 
blocked by a downstream station. 



3.1 The ME Product-Form Solution 

Let at any given time riik be the number of cells of class i at queue k, Sk — 
• • • ^nkR^c) be the state of queue A:, where c represents the class of 
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a cell in service, S = (Si, S 2 , . • . , Sm) be the state of the network, and Qk, Q 
be the sets of all feasible states Sk and S, respectively. 

The form of the ME solution P(S), S G Q, subject to normalisation and 
marginal constraints of the type (2) - (6), can be clearly established in terms 
of the product-form approximation 

M / R R \ 

pist'ln E . (M) 

k=\ \i=l s=lAns>0 / 

where Z is the normalising constant and {gui^ x^i^yki} are the Lagrangian 
coefficients that corresponds to constraints (3) - (6), respectively. Defining 
Zk,k - 1,2 ,... as 

2*= E E (35) 

SkGQk \i=l s=lAns>0 / 

expression (34) can be written as the product of the marginal probabilities 
Tfc(Sk), Sk G Qk, i.e., 

M 

P(S) = nPfc(SO, (36) 

k=l 

where P(Sk) is the marginal ME solution of queue k. 

Ft Ft 

= E (37) 

^ i=l s=lAns>0 

(c.f., (8)). 

3.2 Queue-By- Queue Decomposition Algorithm 

The ME queue-by-queue decomposition algorithm for the approximate anal- 
ysis of arbitrary open QNMs with single server queueing stations, R(R > 1) 
HOL priority classes and PBS scheme under RS-RD blocking is described 
below. It is an extension of the earlier ME algorithm of (Kouvatsos and 
Denazis 1993) dealing with multiple class open FCFS QNMs with arbitrary 
topology and repetitive service (RS) blocking. The algorithm incorporates a 
feedback correction of the transition probabilities in order to mitigate the 
strong underlying assumption that arrival streams per class are renewal pro- 
cesses. Furthermore, the algorithm describes the computational process of 
solving the non-linear equations for cell loss and blocking {'Kkimj} 

probabilities under appropriate flow formulae for the first two moments of 
merging, spliting and departing streams (assumed to be known). Note that 
the probabilities {TToiti, T^kimj) generally depend on the effective cell flow bal- 
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ance equations for {Xoki^^ki} and SCV effective service time parameters 
overall interarrival time parameters {^ki,C‘a j^i} • 

INPUT DATA 

• M,R, 

• \J k,m, k = 1, 2, . . . , M, m = 0, 1, . . . , M& Vz, j = 1, 2, . . . , 

® { Nk , Ao/ei , C‘^ Ofci 5 k'ki ^^ski^ ^kimj } 5 

Step 1 Feedback correction 

For each queue k,k = 1,2,..., M, and class z, z = 1, . . . , with auimj > 0, 
substitute 



f^ki ^ l^kiO- ^kimj) 

^ski ^ ^kiki + (1 “ G'kiki)C^ ki 

Jo ,k = m 

Ofcemj <- j akimj ! 0- - akiki) , m fc, Vm = 1, 2, . . . , iW, 



Step 2 Initialize TTo^i, & T^kimj ^ any value in (0,1), ^ k^m — 1,2, ...,M and 
Vz,i = l,2,...,i^, 

Step 3 Solve the system of non-linear equations {'Kokii '^kimj}i 



Step 3.1 Calculate effective flow transition probabilities {d^imj}- 

^kimj — 0,kimji^ '^kimj^fiX 

dfciO ~ ^fcio/(l '^cki) 

M R 

'^cki — ^ ^ ^ ^ O'kimj'^kimj ; VA^,77Z,Z5 

k:^m=l j=l 



Step 3.2 Calculate effective cell flow balance equations: 



Xoki — Aofci(l TTofci), 

M R 

Xki — Xof^i -f- ^ ^ ^ ^ ^mjkjXmj ? ^k^Tn^i] 
m=l j=l 



Step 3.3 Calculate the effective service parameters. 




f^ki — '^cki)-) 

^ski = T^cki + (1 - T^cki) Clki\ 

Step 3.4 Calculate overall interarrival parameters, 
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A2 

fei~ 



•TTfci . 



= 

^aA:i 1— TTfci 

■^A:z ~ VAj, ij 



2 5 ^kimj ) j 



T^ki 



^Oki'^Oki ~f“ X/m=l X^?‘=l i.^rnjki^rnjki'^mjki / '^mjki)) 



AoH + Z)m=i SjLl /(I '^mjki)) 






where Gki is a suitable flow superposition function. 

Step 3.5 Obtain new values for {TToki, 7rmjki},^Y applying Newton Raph- 
son method, 



Step 4 Calculate interdeparture times 

^dki “ ^ki{^kiiC‘^ki^ ki)i 



where Fki is a suitable flow interdeparture function. 

Step 5 Calculate new value for 

Step 6 Return to Step 3 until convergence of C‘^ki‘ 

Step 7 Obtain performance metrics of interest. 

The main computational cost of the proposed algorithm is of OjkR^M^}, 
where k is the number of iterations in step 3 and (R^M^) is the number of 
operations for inverting the associated Jacobian matrix of the system of non- 
linear eq.s {TTofei, 7Tkimj}‘ However, if a quasi-Newton numerical method is 
employed, this cost can be reduced to be of OjkR^M^}. Moreover, the exis- 
tence and unicity of the solution of the non-linear system of Step 3 cannot 
be shown analytically due to the complexity of the expressions of the cell loss 
(or blocking) probabilities {TTo/bi, T^kimj}'-) nevertheless, numerical instabilities 
were never observed during extensive experimentations under any feasible set 
of initial values. Note that the case of RS-FD blocking can be easily incorpo- 
rated within the algorithm. 



4 CASE STUDY: THE GE/GE/l/Ni, . . . ,N/j QUEUE 

In this section, the ME methodology is applied towards the approximate anal- 
ysis of GE-type open QNMs with R{R > 1) HOL priority classes and a PBS 
scheme under RS-RD blocking mechanism. The GE/GE/l/Ni, . . . ,Ni? queue 
with GE-type interarrival and service time distributions, in conjunction with 
GE-type probabilistic arguments, are used to determine the cell loss probabil- 
ities {tTj, z = 1, 2, . . . , jR} and the Lagrangian coefficients z/^, i = 1,2,. ..,R 
of the ME solution (9). Furthermore, the GE/GE/l/Ni, . . . ,Nj^ queue can 
play the role of a cost-effective building block, together with GE-type flow for- 
mulae for departing and merging streams, towards the approximate analysis 
of corresponding arbitrary GE-type open QNMs under RS-RD blocking. 
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4.1 Cell Loss Probabilities 

Consider a GE/GE/l/Ni, . . . ,Ni^ queue with non-zero interarrival time and 
service time stage selection probabilities ai = (C^^+l)/2 and ri = (Cg^ + l)/2, 
respectively. Since each arriving stream from class iisa bulk stream at Poisson 
arriving instants, it is assumed that each of them will see in the queue the 
same aggregate number of cells as a random observer (n.b., this assumption 
will strictly be true if the SCVs of the arriving streams are equal). Let that 
a tagged cell belongs to class i and its bulk finds the system in state v = 
1, 2, . . . , Thus the number of spaces available to the arriving bulk is ATj — u, 
therefore, it follows that 

P(a tagged cell is blocked & the bulk finds the system 
in state^v, n = 1,2, . . . , AT^, j = 1, 2, . . . ,i?) = (38) 

Unconditioning on all those states AJ with j = 1, . . . , P, yields 

P(a class i tagged cell is blocked Sz the bulk finds the system 
in state,AJ, j = 1, 2, . . . , P) 

= Ef=i (1 - p{A)). (39) 

Finally, taking into account the idle state 5o, 

P(a tagged cell is blocked & the bulk finds the system 

in state So,) = 6i{l-(Ji)^^P{So), (40) 

where 8i = ri/(ri(l — cr^) + cji). Combining eq.s (38)- (40), the following 
expression for probability tt^, is obtained. 

R /Ni-l \ R Ni 

7T, = 5i(l-<r0'^P(5„) + E E +E E 

j=l V V=1 / j=l v=Ni 

(41) 

It is trivial to prove that, V j = 1, 2, . . . , P 

P{A]) = - 1), = 1, . . . , 7Vi - 1. (42) 



513 




Substituting (42) into (41), yields 



R /m-i \ 

j=l \i;=l / 

Ni 

+ E (43) 

v=Ni 

which, finally takes the form 

Nt 

m = P{v), (44) 

i ;=:0 

where Si{v) = Si^ if v = 0 and Si{v) = 1, Vi? > 1. 



4.2 Estimation of the Lagrangian Coefficients {yi} 

From the flow balance conditions (7) and by using expressions (22), (24)- 
(44), the Lagrangian coefficients, yi, i — 1,2, . . . ^ R, can be determined, after 
some manipulation, by 



( R /Ni-2 

_ ,=1 \.=0 

v=Ni-l 

^ Ni-2 

+1 - - -Qi^iXi E 

,,=0 

i=i v=i 

i = 1,2,. ..,R. 



4.3 Flow Formulae 

The ME approximation suggests a decomposition of the network into individ- 
ual multiple class GE/GE/l/Ni, . . . ,Nr queues with merged arrival process 
and revised service time for each class of cells. In order to implement the ME 
solution under an abstract service discipline, the flow processes in the general 
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network should be determined. For each queue k = 1,2^. . . it is as- 
sumed that the arriving and departing streams per class i cells, i = 1, 2, . . . , i?, 
form the renewal processes conforming with GE-type underlying interarrival 
time and service time distributions. The mean rates and SC Vs of the overall 
interarrival and interdeparture processes of class i at queue k can be deter- 
mined as follows c.f., (Kouvatsos and Tabet-Aouel 1990): 

The Interdeparture Process: 



(46) 

(47) 

The Split Process: 



^dki — ^kij 

Clki = 2^a(l-^«) + C'Li(l-2M- 



^mjki — ^mj^mjkiy 
^dmjki = 1 + hmjkiiCaki ~ !)• 



(48) 

(49) 



The Merging Process: 

M R 

^ki ~ EE ^mjki 5 



m=l j=l 



(50) 



r2 _ 
^aki — 



- 14 - 



M R 

EE 



I m=l j=l 



^mjki / ^2 ^ ^Oki 



i^aOki 




for 2 , j = 1,2, ... and k,m = 1,2, ... ,M. 



(51) 



5 NUMERICAL RESULTS 

The credibility of the proposed ME algorithm is illustrated against simula- 
tion via two typical numerical tests involving simple multistage interconnec- 
tion networks with GE-type external interarrival and transmission times and 
single input /output ports and output queueing. Tables 1-2 present the input 
data relating to central server and feedforward types of open QNMs, respec- 
tively, with three nodes (switching components) and two HOL priority classes 
of cells under PBS scheme. Cells entering and flowing in the network follow 
a CPP with geometric batches, whilst for tractability purposes, the GE dis- 
tribution with SCV equal to 0.5 is also used to model an approximate sense 
constant service times. Furthermore, two moment flow approximation formu- 
lae are used based on the superposition and splitting of GE-type streams as 
well as the interdeparture time distribution of a stable GE/GE/1 queue at 
equilibrium (Kouvatsos 1994). For validaton purposes, the following tolerances 
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(TOL) have been used: 

For throughput, X, the TOL is defined by 



SIM(X) - MEM(X) 
SIM(X) 



For the marginal mean queue lengths, (n), the TOL is defined by 



SIM((n;fc,))-MEM((n,,)) 

SIM((n,,,)) 



and TOL for queue length distribution, P(n), utilisation, U and cell loss 
probabilities. Pc, is defined as follows: 



|SIM(cA;i) - MEM(cA;i)| , respectively 



for each class i and station k. The results of the validation study are presented 
in Figures 1-6 which display the error tolerances between ME algorithm and 
simulation (based on QNAP-2 at 95% confidence interval (Veran and Potier 
1985)) for the performance measures of throughput X, mean queue length (n), 
utilisation C/, idle and full buffer state probabilities and cell loss (or blocking) 
probabilities. It can be observed that the ME solutions are very comparable 
to those obtained by corresponding simulation models. Note, however, the 
credibility of the ME solution gradually deteriorates by increasing the service 
time SC Vs. This behaviour may be attributed to the fact that the underlying 
inter arrival and interdeparture renewal assumptions are further violated. 
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Experiment 1 



Table 1 


Raw data: 


M = Z, R = 2, Ni = 10, N2 = 6, 


Queue 1: 


Mi=6.25,P2 = 25,C2^=2,C722 = 3, 

Ai = 1, A2 = IjC'ai = 3,C^2 ~ 3, 

^11121 = 0-5, uii3i = 0.5, aiio = 0, ai222 = 0.5, 01232 = 0.5, 
0^120 = 0, 


Queue 2: 


/xi=3.75,/X2-5,C,^-3,C,^ = 5, 

a2iii = 0.4, 02131 = 0, O210 = 0.6, 02212 = 0.4, 02232 = 0, 

0220 = 0 . 6 , 


Queue 3: 


/^i — 4, /i2 = 12, Cg 1 = 3 , Cg2 — 6, 

^13111 = 0.3, 03121 = 0, O310 = 0.7, 03212 = 0.3, 03222 = 0, 
^^320 = 0.7, 


Experiment 2 
Table 2 


Raw data: 


M = 3, = 2, A^i = 20, N 2 = 10, 


Queue 1: 


/ii =4, M 2 = 10, =0.5, ( 7^2 =0.5, 

Ai =l,Ai =1,C2, =3,C^2 = 5, 

^1121 = 0.5, 01131 = 0.5, Olio = 0, 01222 = 0.5 , 01232 = 0.5, 
^120 = 0, 


Queue 2: 


/ii=l,/i2-15,C2i=0.5,C22 = 0.5, 

<^2111 = 0, 02131 = 0, O210 = 1-0, O2212 = 0, 02232 = 0, 
0220 = TO, 


Queue 3: 


Mi=2,M2-20,C2i=0.5,C22 = 0.5, 

0^3111 = 0, 03121 = 0, O310 = 1.0, 03212 = 0, 03222 = 0, 
^^320 = 1-0, 



6 CONCLUSIONS 

A new ME product-form approximation is proposed for arbitrary open QNMs 
with HOL priority classes under PBS scheme and RS-RD blocking. This so- 
lution is implemented in terms of an iterative queue-by-queue decomposi- 
tion algorithm for the cost-effective estimation of typical performance metrics 
such as cell loss and state probabilities, throughputs, utilisations and mean 
queue lengths. The G/G/l/Ni, . . . JSIr queue is analysed recursively via en- 
tropy maximisation and the generating function approach and plays the role 
of an efficient building block in the solution process. Numerical experiments, 
focusing on simple buffered interconnection networks under GE-type bursty 
traffic and two moment flow approximation formulae, illustrate the high level 
of accuracy of the ME algorithm against simulation. The ME methodology 
can also be used to study discrete-time QNMs such as those based on the 
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Figure 1: Error tolerances of typical performance measure for 
all jobs in the network. 
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Figure 2: Error tolerances of typical performance measure for 
Glass-1 jobs in the network. 
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Figure 3: Error tolerances of typical performance measure for 
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EXPERIMENT -2 



AGGREGATE STATISTICS 
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Figure 4: Error tolerances of typical performance measure for 
all jobs in the network. 




Figure 5: Error tolerances of typical performance measure for 
Class-1 jobs in the network. 
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Figure 6: Error tolerances of typical performance measure for 
Class-ll jobs in the network. 
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Generalised Geometric (GGeo) distribution (Kouvatsos 1994) and the shifted 
GGeo (sGGeo) process (Kouvatsos and Fretwell 1995) with applications to 
multibuffer, shared buffer and shared medium ATM switch architectures with 
or without correlated traffic under various buffer management policies. Work 
of this kind is the subject of current studies. 
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Abstract 

In this paper a flexible burst-level ATM traffic generator is presented. It is a 
PC-based system with a careful allocation of functions between hardware 
(PC ISA board) and software, in a way that allows it to work on-line at full- 
speed (155.52Mbps), on the one hand, and on the other, to be flexible 
enough to emulate a wide range of ATM traffic profiles. Up to 4 boards can 
be hosted by a single PC, each being able to generate up to 16 independent 
streams. Each stream consists of a continuous sequence of “traffic events” 
(burst-silence cycles), with each event being described by three parameters: 
the burst size, the silence duration and the inter-cell distance within the 
burst. The triplets, describing respective traffic events, are generated by 
software and downloaded on-line (through DMA) to the hardware; thus, in 
principle, any traffic model can be implemented, the limits being imposed 
only by the speed of the PC in relation to the total number of independent 
streams (up to 64) generated concurrently. For example, arbitrary 
distributions of the three traffic parameters (including experimental 
histograms) can be independently sampled; or correlated samples according 
to any law desired can be easily produced. Even further, real traffic can be 
emulated, provided that a monitoring device (e.g. a LAN traffic monitor) is 
connected to the PC feeding it with samples of the real traffic stream. 

Keywords 

ATM test tools, ATM traffic generation, traffic models, bursty traffic 




1 INTRODUCTION 



Building test tools for ATM networks is a challenging job from many points 
of view. Firstly, they have to produce and consume traffic at fairly high rates 
(155 Mb/s and 622 Mb/s are the two standardised rates at the ATM User- 
Network Interface, see (ITU-T, 1993A & 1993B)). The problem is not the 
speed on itself. Since it is rather impractical or impossible to store and 
process data off-line before injecting it into the network or after capturing it 
for analysis purposes (due to the huge amount of storage that would be 
required for that), most functions must be performed on-the-fly, requiring 
special hardware implementations. Secondly, a multitude of traffic profiles 
is encountered in an ATM network. In order to produce such diverse profiles 
artificially and/or analyse them with test tools, a great deal of flexibility is 
required by the latter, not easily available by hardware implementations. 

Another issue is cost. The target users group of test tools is not as large 
as that of consumer electronics, for example. This fact, combined with the 
special requirements outlined above, keeps the cost prohibitively high for 
the common user. 

In this paper a flexible and cost-effective solution to the problem of 
ATM traffic generation is presented. Flexible, because the traffic profiles 
are produced in software and only the time-critical functions are 
implemented in programmable hardware (FPGA chips). Cost-effective, 
because the whole system is hosted in a standard PC board. 

Many traffic sources are bursty in nature, i.e. they produce blocks of 
information (bursts), rather than continuous-bit-rate streams. The latter are 
usually encountered within network nodes as a result of successive 
multiplexing stages or after a rate-shaping operation. Nevertheless, even 
after shaping or multiplexing which give rise to smooth or even constant- 
rate traffic flows, the bursts may still exist as distinguished entities within a 
stream and as such are delivered to their destination. 

Apart from the cases where segmentation has been applied (another 
traffic shaping operation aiming at enforcing contracted traffic profiles), 
each individual burst corresponds to a separate information unit (packet or 
block, e.g. a video frame or a Protocol Data Unit from upper layers in the 
overlying protocol stack). Subsequent processing at the receiver may only 
start after the entire burst (packet or information block) has been received. 
In such cases what is of interest from the network side is its performance on 
a burst level rather than on a cell level. 

ATM traffic generators that have appeared so far focus on cell-level 
performance (ADTECH 1997, ALCATEL STR 1994, WANDEL AND 
GOLDERMAN 1994). Even when they are able to emulate higher-level 
dynamics (e.g. producing Markov-Modulated-Rate-Processes) they do not 
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provide burst-level information (e.g. burst-sequence numbers) that would 
enable respective analysis at the receiving end. 

This paper presents the functionality and the architecture of an ATM 
traffic generator enabling burst-level traffic testing (Burst Level Traffic 
Generator - BLTG). Following this introduction, section 2 outlines the main 
characteristics of the developed traffic generation tool. Section 3 presents 
the source model of the generated traffic. Section 4 gives functional and 
architectural details. A high-level user interface, that has been developed 
allowing the definition by the user of the traffic profile for each stream, is 
described in section 5. In the same section detailed examples of traffic 
profiles that can be produced easily by the generator are presented. Finally, 
section 6 summarises the main features of the BLTG and compares it with 
other similar traffic generators. 

2 THE MAIN FEATURES OF THE GENERATOR 

The proposed generator is a PC-based system (ISA board with the associated 
driving software) with a careful allocation of functions between hardware 
and software, in a way that it can work on-line at full speed, on the one 
hand, and on the other it is flexible enough to emulate a wide range of traffic 
profiles. Up to 16 independent ATM traffic streams per board, with an 
aggregate rate up to 155.52Mbs, and up to 4 boards per PC can be 
supported. 

As described in more detail in the next section, each stream is defined in 
terms of a sequence of “traffic events” (burst-silence cycles), each event 
being described by three parameters: the burst size, the inter-cell distance 
within the burst and the silence duration. The software is responsible for 
producing sample values of the traffic parameters and passing them on-line 
to the hardware (through Direct Memory Access - DMA), while the 
hardware produces the actual traffic streams according to the samples 
generated by the software. For periodic traffic streams, the samples within a 
period need to be downloaded only once and the hardware keeps on working 
producing the same desired pattern periodically (“cyclic” mode of operation, 
in contradistinction to the “refresh” mode of the continuous downloading of 
traffic events). Moreover, the hardware generates and inserts appropriately 
additional information (time-stamps, sequence number of the cell within the 
burst, sequence number of the burst etc.) that is necessary for burst 
delineation and analysis of basic performance measures (delay, losses) at the 
receiving side. 

Another important feature of the proposed system (enabled by the 
flexibility of the followed software-hardware co-design approach) is its 
ability to inter-work with a real traffic source (e.g. a video card) and emulate 
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its profile, although with dummy data. All what is required in this case is a 
traffic monitor, which will take measurements on the real traffic stream. If 
the real source is also PC-resident, it's a matter of a simple software routine 
to get these measurements and pass them to the generator's software. In 
section 5 examples of traffic models that can be emulated by the generator 
are presented. 

In summary, the main features of the BLTG are the following: 

• Up to 4 boards can be hosted by one PC. 

• Up to 16 independent ATM cell streams per board, each one with a 
different traffic profile. Thus a total number of 64 independent streams 
can be generated by a single PC. 

• Supported rate up to 155.52Mbs per card. 

• Supported rate for each stream from 4Kbs up to 155.52Mbs. 

• Full UTOPIA level 1 compliant interface, 155.52Mbs (ATM Forum 
1994B). 

• Many physical interfaces can be connected as daughter boards to the 
generator’s mother board (e.g. single-mode or multi-mode physical 
interface for STMl/STS-3c/OC-3). 

• Handling of signalling connections and signalling support (ATM-Forum 
3.1 and/or ITU-T Q2931 Access Signalling, and ITU-T Q.2110 
Signalling AAL for both user and network sides of attachment, (ITU-T 
1994, ITU-T 1995, ATM Forum 1994A)) . 

• Handling of bi-directional user data ATM connections with payload 
exchanged through the PC bus. 

• Generation of cell with 53 or 54 bytes length. 

• Support of special data fields within the payload in predefined positions: 
burst identifier, stream identifier, traffic generator card identifier, time 
stamp. 

• Priority mechanism that solves, in real time, collision problems that may 
occur when more than one stream compete for the same cell-slot of the 
ATM interface. 

• Description of each traffic profile as a series of burst and silence 
intervals, provided by software. Zero silence values are possible, 
resulting in continuous streams (of Constant or Variable Bit Rate). 

• Any histogram of values, either derived on-line from closed-form 
expressions or from experimental data, can be sampled for the 
parameters of each of the 16 streams. 

• Ability to configure for CBR, VBR and ABR traffic types in the context 
of the previous two characteristics. The ABR type of operation, in 
particular, is configurable as easily as the VBR type, provided that the 
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PC, hosting the generator board, can accept and process Resource 
Management (RM) cells. 

• Modification of the traffic profiles in real time, according to the user 
demands. 

• Monitoring capabilities of the produced traffic (counting number of 
transmitted cells: - per cell stream; - over all cell streams in a card). 

• Open architecture of design to allow addition of new functionality. 

3 THE MODEL OF THE GENERATED TRAFFIC 

As mentioned before, each one of the sixteen streams, that can be produced 
by a single BLTG board, is described as a series of burst and silence cycles 
(active and idle periods), named trajfic events. Each burst is described by 
two parameters {Number of Cells -NC- and Inter-cell Distance -ID- within 
the burst) and each silence period by its duration {Silence Duration -SD). 
Periods are measured in ATM-slot time units. The traffic stream model and 
the meaning of the three traffic parameters are schematically shown in 
figure 1. 
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Figure 1 The BLTG traffic model 



The software produces a sequence of traffic events for each stream and 
delivers it to the hardware (batch-wise, as a response to respective 
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interrupts). The hardware, in turn, produces actual ATM cell traffic 
corresponding to the downloaded traffic event sequence: for each traffic 
event it generates and transmits over the physical interface a number of 
ATM cells equal to iVC, placed ID ATM slots apart, and then applies a 
silence for an interval equal to SD ATM slots. Apparently, this way of 
traffic stream construction allows in principle the implementation of any 
desired traffic model. It is up to the software to convert the model 
parameters into a respective sequence of traffic-event triplets [iVC/, ZD/, SDi]. 
The following cases are already supported by the current generator software 
(see also section 5). 

3.1 Independent burst and silence samples from arbitrary 
distributions 

This is the simplest albeit quite general model of traffic event generation. 
Two distributions are maintained as look-up tables in the main memory, one 
for the burst size and another one for the silence duration. The distributions 
are sampled independently to give the respective parameter values for each 
new traffic event. The third parameter, namely the intercell distance within 
the bursts, can be either constant or variable (drawn from a third distribution 
or determined otherwise, e.g. as a response to RM cells for an ABR-type of 
operation). The distributions can originate either from known statistical 
models (e.g. normal, exponential, uniform etc.) or from histograms derived 
experimentally. As it will be exemplified in section 5, streams of trivial or 
widely-used profiles (e.g. Constant Bit Rate - CBR, periodic On/Off) can be 
easily programmed as special cases of this traffic class. 

3.2 Independent burst samples, dependent silence intervals. 

This scenario may arise in cases of burst-level traffic shaping (see e.g. 
Mitrou N., 1998), where the silence enforced after the transmission of a 
burst is deterministically calculated as a function of the preceding burst size. 
Again, a distribution is sampled for the burst size, independently for each 
new event, while the silence to follow is calculated according to the desired 
shaping law. 

3.3 Correlated traffic 

Any model giving correlated type of traffic (e.g. Auto-Regressive Moving 
Average - ARMA) can be implemented by the software to give sequences of 
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traffic events. The rest (sequence downloading and generation of the cell 
stream by the hardware) is performed as in the other cases. 



3.4 Periodic traffic 

Periodic On/Off traffic can be produced as a sub-case of 3.1 above, by 
choosing constant values for the three parameters. More general patterns, 
however, can be directly defined by the user as sequences of triplets 
describing respective traffic events. Provided that the number of such 
triplets is a sub-multiple of the batch size which is downloaded to the 
hardware (512 in the current implementation), the batch is downloaded only 
once and periodically processed by the hardware. This mode of (periodic) 
traffic generation is enabled by marking the respective traffic streams, 
during initialisation, as being in the CYCLIC condition (see section 5.1, 
User Interface). 



4. FUNCTIONAL DESCRIPTION AND ARCHITECTURAL 
DETAILS 

Figure 2 gives a top-level block diagram of the BLTG. As stated above, the 
software part is responsible for maintaining a pool of parameter samples 
derived either from closed-form distributions, experimental histograms, 
user-defined sequences or by interfacing with real-traffic monitors. The 
hardware part pumps data out of the pool and forms bursts of cells 
accordingly, which are sent to the network over the Physical ATM interface. 
In order to avoid “sample starvation” at the hardware, double buffering is 
essentially performed on it: a new_samples request is issued whenever the 
buffer of a particular stream is half-emptied. 



4.1 The software 

A functional description of the generator’s software is shown in fig. 3. The 
software consists of two successive parts: one part aiming at the preparation 
of the traffic profiles (parameter distributions or user-defined event 
sequences) in the desired format, which is run off-line, and a second part 
which implements the on-line functions necessary for initialising the traffic 
streams and for driving the hardware. Vertically, the software splits in two 
branches: one for the distribution-based streams, where new batches of 
traffic events are continuously downloaded to the hardware (REFRESH- 
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conditioned streams), and a second one for the cyclic streams. For the latter, 
a single batch of traffic events is downloaded during activation, which 
thereafter is repeated periodically by the hardware(CYCLIC-conditioned 
streams). 




Figure 2 Block diagram of the Burst-Level Traffic Generator 
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Figure 3 Block Diagram of the BLTG Software 



Preparation of parameter distributions 

The method of producing samples in accordance with a specific Probability 
Distribution Function (PDF), say F(x), is the one of randomly sampling the 
inverse of the desired PDF, 
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F~^(y) (see e.g. Law A.M. and Kelton W.D., 1982)). For the random-number 
generator one can either use the standard random function available in 
almost all programming languages, or implement his own pseudo-random 
generator (for better accuracy and/or speed). Fig. 4 depicts such a generator, 
implemented as a linear-feedback shift register, and how it is used to give 
samples of the distribution F(x). The 13 LSBs of the shift register are used 
as the address of a look-up table, containing (2^^ entries of) the inverse PDF, 
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Figure 4 A linear-feedback pseudo-random number generator and PDF 

sampling 

The off-line software has to produce tables of the inverse PDF (inverse 
cumulative histograms). In some cases (like for exponential or uniform 
distributions) such an inverse function is provided in closed form and the 
job is fairly simple. In other cases, however, like that of a normal or an 
Erlang-type distribution, F"^(y) is not given in closed form. Even worse, the 
user may have available only a histogram from experimental measurements. 
If the distribution y=F(x) is provided in closed form, the inversion is 
performed at discrete points of the y-axis by a trial-and-error procedure (e.g. 
through dividing the interval of the jc-axis, containing the sought value, 
successively by two and deciding on the new interval through comparisons; 
the final F“^(y) value is determined by a simple linear interpolation). If even 
the F(x) is not available in closed form (e.g. in the case of experimental 
histograms), a cumulative histogram is calculated and inverted at desired 
points through an inversion procedure simitar to the above. An example of a 
PDF, a discrete histogram and the way of their approximate inversion is 
schematically depicted in fig. 5. 

In all cases, the output of this part of the software is a file with the 
desired inverse cumulative histograms for the three traffic parameters. 
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Figure 5 Probability Distribution and discrete histogram inversion 



Preparation of user-defined sequences 

As an alternative to the procedure of 4. LI above, the user may directly enter 
sequences of parameter values (in triplets [NC, ID, 5D], one for each traffic 
event). This can be done through using any text editor and storing the 
sequences in files with an appropriate naming convention. The size of these 
user-defined sequences of events should be a sub-multiple of 512 (i.e. a 
power of 2, from 0 to 9), so that, by repeating itself, to fill up a complete 
batch of 512 events, as required by the hardware. 

The on-line procedures 

The on-line part of the software starts with the initialisation of the traffic 
sources, by associating a traffic profile to each one of them. The respective 
profiles are then loaded into the main memory as tables containing the 
inverse cumulative histograms or the user defined sequences of events (three 
such tables for each stream). A first batch of traffic events (512 triplets of 
NC, ID and SD samples) for each initialised stream is downloaded to the 
hardware, which is then activated. On a hardware request basis, new batches 
(256 triplets of NC, ID and SD samples) are formed and downloaded for the 
distribution-based (REFRESH-conditioned) streams. 
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The Interface Driver & Controller module, shown in fig. 3, is used for 
the communication between the hardware and the software. It recognises 
interrupts generated by the hardware and downloads batches of traffic event 
samples for the respective traffic streams. 

4.2 The hardware 

The architecture of the Burst Level Traffic Generator (BLTG) hardware is 
presented in figure 6. It consists of the following logical modules: 

PC Interface Module 

A DMA-based PC interface for a 16-bit ISA bus used to provide high rate 
data exchange from the host to the TGC. This module is also responsible for 
passing interrupts from the TGC to the host processor. Interrupts are 
generated when one or more of the storage units associated with one stream, 
are half-empty. 

Event Storage Memory (ESM) 

This is the memory where NC, ID, SD triplets for a number of traffic events 
(512) for each stream are stored. The ESM is divided into 16 different 
segments each one associated with one stream. This memory is implemented 
as a high-speed Dual Port RAM. 
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Input Output Unit (lOU) 

This module reads sequentially from the ESM each NC,ID,SD triplet for the 
current traffic event for each one of the sixteen streams, stores internally the 
address of the next traffic event and generates an interrupt when half of the 
traffic events (i.e. 256) for one stream have been already processed. 

Interrupt Storage Unit (ISU) 

This is a temporary storage space intended to queue the interrupts generated 
by the lOU and used by the PC interface module. The ISM is implemented 
as a FIFO memory. 

Master Control Unit (MCU) 

This provides control and the synchronisation of the whole system. 

Source Control Unit (SCU) 

This module performs all functions for the traffic generation for each one of 
the sixteen streams. The architecture of this module is presented in figure 7. 

Header Storage Memory (HSM) 

This is a memory where a five-byte-header for each one of the 16 different 
streams is stored by the host, during initialisation. The CRC header field is 
calculated by the software part. 

Traffic Multiplexing Unit (TMU) 

This module solves collisions when more than one stream attempt to 
transmit a cell in the same cell-slot. 

Cell Generation Unit (CGU) 

The cell selected by the TMU is generated in this module. As already 
mentioned, the cell payload comprises of some special fields as shown in 
figure?. These fields are used for analysis purposes at the traffic analyser 
side. 

Cell Transmission Unit (CTU) 

This module transmits the cell generated by the CGU to a standard UTOPIA 
interface, as recommended by ATM Forum. 

User Information Unit(UIU) 

This unit is responsible for sending and receiving ATM cells with user 
defined information field passing through the ISA bus. The UIU has priority 
over the CTU in accessing the UTOPIA i/f. Note that the CTU module only 
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sends cells to the UTOPIA i/f (generation process), while UIU sends and 
receives cells conveying user defined payload. 



4.3 Generated traffic cell format 

As already mentioned, the BLTG inserts some special fields in the ATM 
cell payload as produced by the CGU (this is not the case for the UIU, where 
the ATM cell payload is user defined and passed through the ISA bus). 
These fields (inserted by the GCU, Cell Generator Unit) are depicted in fig. 
7 and described below. 

Stream Code (SC) 

This is a 4-bit field containing the identifier of the stream that generates this 
cell. Value xxxx is source #0, value yyyy source #15. 

Generator Code (GC) 

This is a 4-bit field and contains the identifier for the specific BLTG board. 



Header 

(5 bytes) 



sc (4 bits) 


GC(4 bits) 


ID 


(1 byte) 


NoC 


(1 byte) 


CSN 


(1 byte) 


ST 


(2 bytes) 


TS 


(3 bytes) 


BSN 


(1 byte) 



; • PM 

(38 or 39 bytes) - 



Figure 7 Cell format provided by the BLTG 
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Through the SC & GC fields each source cell stream can be identified not 
only through their VP/VC identifier (resident in the header) but also inside 
the payload (16 cell sources in 16 different BLTG boards). 

Inter-cell Distance (ID), Number of Cells (NoC) and Silence Duration (SD) 
These fields fully describe the current burst of the cell stream Traffic Event 
in Fig. 1. These fields are the same for all the cells within the same burst. 

Cell Sequence Number ( CSN) 

This field specifies the sequence number of the cell in the current burst. It 
ranges from 0 to 255. 

Time-Stamp (TS) 

This 3-byte time stamp field gives the indication of the generation time by 
enumerating ATM slots and inserting the current number in the cell 
occupying the particular slot. 

Burst Sequence Number (BSN) 

This field specifies the burst sequence (Active Period in a Traffic Event) 
number within the train of Traffic Events of the particular source. This field 
is used for burst delineation in the side of an analyser and is the same for 
all cells of the specific source in the current burst. 

Dummy Data (DM) 

This field is created by a pseudo-random number generator. The length of 
this field is 38 or 39 bytes depending of the whole length of the cell (53 or 
54 bytes). 



5 USER INTERFACE, TRAFFIC PROFILE EXAMPLES AND 
TYPICAL TESTING CONFIGURATIONS 

5.1 User interface 

A friendly interface has been developed to help the user in creating traffic 
profiles and programming the BLTG hardware. Fig. 8 is an instance of the 
Traffic Profile Creation dialog, part of this interface. As shown in figure 10, 
the user can select among 4 known distributions (constant, exponential, 
normal and uniform), independently for each of the three traffic event 
parameters of each stream. A fifth choice (bullet “Other”, in fig. 8) gets as 
input an arbitrary distribution, stored as a discrete histogram in a file. Such 
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histograms may be derived in various ways (e.g. from closed-form 
probability density functions, experimental measurements, etc,). In all cases, 
the off-line software will automatically create the respective inverse 
cumulative histograms and store them in files for later use by the on-line 
part. The VIEW TRAFFIC PROFILE option in fig. 8 gives the user a visual 
representation (e.g. histograms) of the created traffic profile. 




Figure 8 Example of Traffic Profile Creation dialog 



The user interface for the on-line software is also friendly and flexible. It 
allows the user to view and handle the complete list of one board’s sources, 
including their type (CBR, VBR, UBR), status (ACTIVE or not), condition 
(CYCLIC or REFRESH) and VPI-VCI assignments (fig. 9). On the same 
screen the user has some on-line monitoring facilities of the actual traffic 
streams generated by the hardware: on the last column of the screen, as in 
fig. 9, the user can see the number of cells that actually transmitted by each 
source from its activation. In order to support large observation periods, 64- 
bit counters have been used (number of cells between 0 and 1.84e+19) 
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Figure 9 Traffic Generator status and control dialog 



5.2 Examples of traffic profiles 

In this section some of the most known ATM traffic models are translated 
into the BLTG model. The selected examples aim at demonstrating the 
flexibility of our traffic generator and not to exhaustively address its scope 
of applicability. 

Constant Bit Rate ( CBR) streams 

This type of traffic can be easily produced by using any distribution for the 
burst size, a constant inter-cell distance that corresponds to the desired rate 
and a zero silence. Obviously, a CYCLIC stream with just a single triplet 
[NC any, ID, SD=0] written in the user-defined-sequence file does the job 
with the minimum resources. 

CBR with jitter 

This pattern arises when a CBR stream passes through successive 
multiplexing stages. With the assumption of a large number of independent 
such stages, a normal distribution is a good model of the resulted jitter ( = 
deviation from the ideal CBR cell positioning). Jittered CBR streams can be 
produced by the BLTG through using: AC=1, /D=0 and SD distributed 
according to the intercell distance of the jittered stream, e.g. 5D=max{0, 
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[l/r-l+(i]}, where r is the desired constant rate in cells/slot, d a normally 
distributed variable with zero mean and [y] denotes the integer part of y. 

Periodic traffic streams 

Any periodic cell stream, with a pattern that can be mapped onto a sequence 
of fiLTG events [NQ, IDi, SDi], i=l,2,...M, M=2^,me {0,1,. ..9}, can be 
produced by the BLTG as a CYCLIC stream. Notice that the period duration, 

expressed as D = + 1)*AC, +57),, may become extremely large, 

depending on the values of the individual event parameters. 

Markov -Modulated Rate Processes (MM PR) 

Such processes have been widely used in the literature for modelling ATM 
traffic, due to their flexibility and analytical tractability (Stem T.E. and 
Elwalid A.I. 1991, Elwalid A.I. and Mitra D. 1993). An underlying Markov 
chain drives the visiting of the source to different states, each being 
characterised by a certain cell rate. The sojourn time at each state is 
exponentially distributed (Markovian behaviour). An MMPR can be easily 
mapped on the BLTG traffic model by forming a traffic event for the visit of 
the source to a new state. The Markov chain is implemented to give the 
transitions between the states and, hence, the ID values for the respective 
BLTG events. An exponential distribution is sampled for the corresponding 
sojourn times, which, divided by the respective ID, give the NC values. The 
silence duration, SD, of all the events is set equal to zero. The procedure of 
realising an MMRP stream is schematically shown in fig. 10. 




Figure 10 Producing traffic events of MMRP streams 
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Exponential On/Off 

This is a special MMRP, as above, featuring only two states: the On state, 
with the source transmitting at a constant rate, and the Off state, with the 
source being silent. The duration of both states is assumed exponentially 
distributed. This model is directly implemented on the BLTG, by using two 
exponential distributions for the NC and SD parameters, respectively, and a 
constant- valued ID (corresponding to the desired rate of the On state). 

Traffic with correlated burst and silence duration 

As mentioned in section 3.2, this type of profile may arise when applying 
burst-level traffic shaping. For example, if we want to control the mean rate, 
we should enforce, after each burst, a silence proportional to the size of the 
preceding burst. Similarly, if we want to keep the effective rate constant (as 
defined through appropriate burst-level modelling), a silence calculated 
directly in terms of the preceding burst size and the desired effective rate 
and Quality of Service (QoS) figures should be enforced (Mitrou N. 1998, 
Mitrou N. and Kavidopoulos K. 1998). In such cases, the table holding the 
SD distribution is directly calculated by applying the desired SD-NC 
dependency law on the NC table. The latter is derived from the respective 
distribution as usual. During the on-line procedure of forming traffic event 
samples, both tables are sampled at once (with the same random index) to 
give suitable pairs of NC,SD values. 

It should be emphasised again that the above cases provide only some 
typical examples of the BLTG usage. Since the traffic events of each stream 
are calculated by software, any traffic model can be implemented, in 
principle, provided that it doesn’t impose excessively heavy computations 
that could violate the time limits set by the on-line procedures. 



5.3 Typical testing configurations 

In the following figure 11 one can see how the BLTG can be used in 
experimental configurations for performance testing. A number of BLTG 
boards can be used to load the network (or a particular network component) 
with background traffic, while other board(s) will produce test traffic. The 
test traffic can be of an artificial type, or emulation of a real traffic source 
(e.g. of a LAN). An ATM Traffic Analyzer (7A), connected to the network, 
will conduct measurements on the received test traffic concerning various 
measures of interest (cell or burst losses, cell or burst delays etc.). 
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Figure 11 Typical performance testing configuration using the BLTG 



6 CONCLUSION AND COMPARISONS 

A new ATM traffic generation tool was presented. It is a PC-based system 
(ISA board with the associated user interface and driving software), with a 
careful allocation of functions between hardware and software. The 
followed software-hardware partitioning allows functioning at full speed, on 
the one hand, and a flexible implementation of a variety of traffic models, 
on the other. The emphasis is on the burst-level modelling, hence the name 
Burst Level Traffic Generator {BLTG). A traffic stream is defined as a 
sequence of burst-silence cycles {traffic events) described by three 
parameters, which are downloaded on-line and processed by the hardware. 
This architecture allows not only the implementation of sophisticated traffic 
models (e.g. MMRPs of any order or correlated traffic), but also the 
generation of traffic in a “dynamic” style. For example, the generator can 
emulate real traffic streams by communicating on-line with appropriate 
monitoring devices; or it can change the traffic profiles according to 
feedback information from the network (e.g. ABR-type of operation), 
provided that appropriate facilities for receiving and processing this 
feedback (RM cells) are installed on the PC hosting the BLTG. 
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Compared to other ATM traffic generation tools (ADTECH 1997, 
ALCATEL STR 1994, WANDEL AND GOLDERMAN 1994), the BLTG 

features the following advantages: 

• Up to 16 independent traffic streams can be produced and multiplexed on 
the same physical interface (compare this, for example, with the 8 
independent stream available by the AX/4000 (ADTECH 1997)). Four 
such boards can be hosted by a single PC, leading up to 64 independent 
sources per PC. 

• The traffic streams produced by the three tools mentioned above are 
static, in the sense that, once programmed (during initialisation), they 
follow the same statistics. With the BLTG, due to its pioneering design 
with suitable partitioning of functions between software and hardware, 
the user has the flexibility to change the traffic profiles "on the fly" (i.e. 
without re-initialising the generator). 

• The previous feature becomes even more important in cases of testing 
ATM services which are adaptable to network feedback (e.g. ABR-type 
of service) . The authors believe that the current architectures of the other 
tools mentioned above could not support ABR, while in the BLTG case it 
is a matter of simple software to interpret RM cells and adjust the 
profiles accordingly. 

• The BLTG directly supports two levels of traffic statistics, the cell level 
and the burst level, by inserting appropriate information into the payload 
of the transmitted cells, allowing respective analysis at the receiving side. 
The other tools mentioned above, operating basically on a cell level, can 
incorporate only implicitly (e.g. with the three-state model supported by 
the AX/4000) burst-level statistics. 

• The power of the BLTG lies on its feature to produce the traffic profiles 
on-line by software, without sacrificing the full-speed operation. A sound 
example of its flexibility is the potential to interwork with a traffic 
monitoring device or a real traffic source (e.g. a real-video card) and pass 
samples of traffic parameters to its hardware for real-traffic emulation. 
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Abstract 

Together with measurements and analytical methods, the simulation-based 
evaluation of cellular systems will be increasingly important as the deploy- 
ment of new mobile applications imposes new requirements both on the radio 
interface and on the fixed network infrastructure. Efficient allocation of the 
network’s resources must be based on reliable and flexible performance eval- 
uation techniques. In this paper we describe a simulation environment opti- 
mized for the performance analysis of wideband cellular networks. To handle 
the complexity of the system without losing low-level details due to a high- 
level abstraction, a hierarchical simulation structure is developed which is also 
largely based on the integration of analytical evaluations’ results into the sim- 
ulation. The resulting structure can surprisingly efficiently (both in terms of 
simulation run time and in terms of modeling flexibility and speed) simulate 
large and complex systems while the level of abstraction can be freely selected 
in a wide range by the user. For instance, in case studies we find that simula- 
tion times of ATM based cellular networks can be an order of a magnitude less 
than using most of the readily available simulators. Though the simulation 
environment described here is specific to ATM/AAL2 based mobile networks, 
the proposed concept is more widely applicable to accelerate simulations. 



Keywords 

hierarchical simulation, ATM, AAL2, mobile systems 



1 INTRODUCTION 

While the penetration of cellular mobile phones increases rapidly and may 
soon catch up with the ordinary telephone penetration, there is already a 




clear trend that goes beyond this quantitative change. The appearance of 
new applications with higher bit rate and diverse GoS and QoS requirements 
imposes new requirements both on the radio interface and on the fixed network 
infrastructure. 

In this environment the efficient use of the network’s resources will be in- 
creasingly important. While peak allocation and eventually overprovisioning 
may be adequate in today’s single application mobile networks, in the future 
mobile operators will need to improve efficiency as high as possible. To do this, 
operators might need to apply variable bit rate coding of traffic, use ATM as 
transport network infrastructure (Eneroth 1997) and optimize resource man- 
agement. The key to efficiently exploiting this more complex system might 
be to develop an accurate way to analyze its performance - if possible, before 
actually building it. 

For any telecommunication system, performance analysis can rely on the 
following approaches, exhaustively described by (Kurose 1988): 

• analytical methods, 

• simulation, 

• measurement and prototyping. 



Measurements and prototyping usually provides the most precise and reli- 
able information but the use of this technique is often very expensive, time- 
consuming and inflexible. Analytical evaluation methods give a larger freedom 
in varying the investigated system’s parameters but their applicability is re- 
stricted by the need to find an analytically tractable model. With simulation 
techniques the level of abstraction can be freely determined though it affects 
largely the required processing capacity and the accuracy. As none of the 
three approaches provide an ideal solution in all situations, the analysis of a 
complex system must use a combination of these. 

In this paper we describe an extension of the PLASMA ATM simulator, 
first described by (Haraszti 1995), which makes it capable to efficiently simu- 
late ATM-based wideband cellular networks. We argue that in order to meet 
the above listed requirements a new simulation technique needs to be con- 
sidered. The proposed hybrid hierarchical simulation environment is designed 
specifically for the performance analysis of systems where the complexity re- 
quires a combination of analytical techniques. The simulation is largely based 
on the integration of analytical results in the simulation which together with 
the hierarchical structure makes it capable of simulating a large and complex 
network without hiding the bit-level details or radio-related features behind 
a high-level abstraction. 

After a brief introduction to the proposed concept, the applied model 
and the simulator’s architecture will be described. Two simulation exam- 
ples will also be provided to illustrate the simulator’s capabilities. The ex- 
amples are taken from the analysis of the new ATM Adaptation Layer No.2 
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where standardization activity was based on detailed performance evalua- 
tions (Eneroth 1997). The new adaptation layer provides high efficiency and 
low delay for cellular transport. In this investigation of the adaptation layer’s 
performance the simulation environment described here played an important 
role. 



2 SIMULATION TECHNIQUES 

For large and complex systems a fully detailed simulation of the entire problem 
is often unrealistic. A byte-level simulation of a single ATM connection is so 
time-consuming that it is impractical in real investigations. While in simpler 
systems (PSTN or other constant bit-rate, single application communication 
systems) a higher level investigation may be appropriate, a more sophisticated 
system’s characteristics such as bit error rate or delay can depend largely on 
lower level behaviour. 

In the simulation of ATM based wide band cellular networks an additional 
difficulty arises from the fact that events at various levels of abstraction and at 
various time scales need to be modeled and simulated. For instance, low level 
changes in the quality of the radio interface may trigger a handover event at 
the connection level, which, in turn, may have cell level consequences inside 
the affected switches. We observe that this basic characteristic has two major 
general requirements for an efficient and practically useful simulator: 

• the description, modeling and simulation of the system must be able to 
capture relevant events at whatever level of abstraction they happen; 

• the description and modeling of the system must support the simulation of 
events at whatever time scale they happen. 

(Note that the term relevant here refers to application specific modeling 
details.) We refer to these two basic requirements as spatial and temporal 
scalability of the simulator respectively. 

Extending the classification of (Frost 1988) and (COST 1992) the various 
techniques for enhancing modeling and simulation efficiency of complex sys- 
tems fall into the following broad categories: 

• hybrid models increase the efficiency of the simulation by combining ana- 
lytical models with simulation, see e.g. (O’Reilly 1984), (Lavenberg 1979) 
and (Frater 1989). Our method inherits the basic (rather general) idea of 
combining analytical and simulation techniques, as described in Section 3. 

• variance reduction techniques improve computational efficiency by using 
statistical methods to obtain more accurate performance measures, as in 
(Shanmugan 1980), ( Villen- Altamirano 1991), (Law 1991), (Fishman 1983), 
(Rubinstein 1985) and (Lavenberg 1982). We have found that finding a 
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good probability transform at various abstraction and time scales can be 
difficult. Even though these methods offer a considerable increase of simu- 
lation speed without requiring more processing capacity so far their appli- 
cability has only been shown for relatively simple examples and their ex- 
tension for more realistic problems needs further research. For an overview 
of these and other special simulation techniques including hybrid and hi- 
erarchical simulation see (Frost 1988) and (COST 1992). 

• extrapolative methods increase computational efficiency of a simulation by 
employing statistical methods to estimate the tail probability distribution 
outside the sample range (Jeruchim 1984), (Weinstein 1971), (Berberana 
1990), (Dijk 1991). 

• parallel and distributed methods attempt to increase the simulation time 
by employing more computer resources, see e.g. (Fujimoto 1994) and (Pham 
1997) and the references therein. The performance of even advanced paral- 
lel simulation techniques, however, does not seem to justify the additional 
programming effort which is needed in the decomposition and synchroniza- 
tion tasks inherent in such techniques. 

• co-simulation techniques aim at loosely interconnecting two or more in- 
dependently running simulators of different abstraction levels by allowing 
them to exchange messages. This approach though attractive, often suf- 
fers from problems caused by timing and causability constraints (Coppola 
1997). The challenge of efficient communication between the various levels 
in multiple time scale simulations is addressed in e.g. (Hines 1997), but the 
solution proposed there is not directly applicable to communication net- 
works. Our approach is in fact a one directional co-simulation technique, 
also importing ideas from the hybrid approach. The main benefit of these 
changes is that the higher level simulator never needs to await results from 
the lower level counterpart. Instead, when needed, the higher level simula- 
tor uses predictions. 



Recently a new and interesting approach, the fluid- flow simulation tech- 
nique has been proposed in (Kesidis 1996) and (Gustafsson 1997) which adopts 
basic concepts from the fluid-flow analysis approach into traditional discrete 
event simulation techniques. This approach appears to be very efficient in 
deriving performance measures at the cell level in ATM networks, but seems 
to be difficult to extend to the network level and meet both our spatial and 
temporal requirement at the same time. 

Also recently, some work has been started in simulation of wireless networks 
and services, (Mishra 1996) gives a classification of the proposed solutions 
with some references. An interesting simulation of voice over ATM can be 
found in (Iyer 1997) but this approach does not aim at scalability to large 
networks. Parallel simulation is proposed by (Liljenstam 1997). As in the 
case of broadband networks, the main problem is finding the proper balance 
between modelling complexity and simulation efficiency. Our solution leaves 
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b) 



c) 



Figure 1 Hierarchical simulation: traditional approaches (a and b) and with 
flexible device-level simulator (c). 



this choice to the user who can set the level of abstraction depending on the 
actual problem and on the required precision. 



3 PROPOSED HIERARCHICAL SIMULATION APPROACH 



3.1 Concept 

In the case of flxed cellular networks the main focus of performance analysis is 
on the trade-off between network utilization and per-connection service quality 
parameters. Typically, the network’s response to different connection control 
and routing strategies needs to be evaluated with service quality requirements 
as optimization constraints. This kind of investigation requires that the entire 
network be studied while the model is detailed enough to include the internal 
structure of network elements down to queues and processors. As this does not 
seem to be feasible in one simulator we propose a hierarchical decomposition 
of the problem. 

As Figure 1 shows in the case of a hierarchical decomposition the lower level 
simulator (s) either provide characteristics about a number of identical or sim- 
ilar network entities (Figure la) or a dedicated lower level simulator must be 
assigned to each network element of interest (Figure lb). While the former 
solution is based on the investigated system’s specific inherent feature of hav- 
ing a number of identical network elements working in similar circumstances 
(which does not necessarily apply for cellular systems) the latter requires the 
use of a number of simulators in parallel which might come back to the prob- 
lem of insufficient processing capacity with the additional problem of requiring 
a specific simulator for each network element of interest. 

In our approach only one simulator is used at the lower level (Figure Ic) 
but that is designed in a flexible way which allows for the device-level simula- 
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tion of an almost arbitrary subset of the entire network. The simulation speed 
will, of course, depend on the size and complexity of the selected subset. This 
flexible device-level simulator is equipped with a communication interface us- 
ing a conflguration description language designed specifically for this purpose. 
Configuration descriptions arriving to this interface are interpreted inside the 
device-level simulator and a simulation session starts immediately. After the 
session, simulation results are available through the same interface. 

The device-level simulator can, of course, simulate only one configuration at 
a time. But instead of defining this configuration in advance, it is dynamically 
configured and re- configured by the network-level simulator during simulation 
time. A single device-level simulator running orders of magnitude slower than 
the network-level simulator can not continuously provide information on the 
behaviour of each network element. This is, however, not necessary if the pri- 
mary output of the investigation is the system’s call-level behaviour, typically 
the load at certain network elements, call blocking ratio or network revenue 
as a function of different routing or admission control policies. While lower 
level, for instance Quality of Service parameters might be of equal impor- 
tance, their exact value is usually of no interest as long as they satisfy certain 
system-specific bounds. 

In this kind of studies a pure network level simulation is not sufficient as 
it does not give reliable information on the number of times the bounds are 
violated. However, a full cell-scale simulation is not only infeasible but also a 
waste of CPU time by providing, for example, cell delay values with millisec- 
ond precision for paths where delays are tens of milliseconds below the limit of 
tolerable delay. Our approach avoids this waste by concentrating device-level 
simulation power to points in time and to areas in the network when and where 
the violation of bounds is suspected to be frequent. Simulation is primarily 
performed on the upper level, allowing the user to focus on network-level be- 
haviour. As not all network elements are simulated at the device level, the 
call-level simulator must be prepared for estimating device-level behaviour, 
typically based on the equivalent bandwidth approach. Whatever precision 
this estimation gives will determine the network control and behaviour. How- 
ever, thanks to the device-level simulation sessions, the accuracy of the infor- 
mation learned from the simulation is not limited by the estimation. In brief, 
we will obtain accurate information, on both call and cell level, about a net- 
work controlled by inaccurate estimates. Despite its limitations, this method 
tends to model real networks that are typically controlled by inaccurate esti- 
mates but allow for measurements of “arbitrary” precision while improvement 
of the estimates is based upon feed-back from these measurements. 
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Figure 2 Simulation time scales. 



3.2 Communication and synchronization 

Figure 2 schematically illustrates the concept in operation for an elemen- 
tary example: packet-switched connections being established and released on 
a single physical link. The network level simulator controls the process by 
calculating the connections’ aggregated equivalent bandwidth and applying 
admission control. Looking at just the network level simulator’s output gives 
sufficient information on the revenue subject the equivalent bandwidth esti- 
mate. It also gives the information that during “most of the time” the service 
quality must have been satisfactory since the estimated equivalent bandwidth 
was far below the link capacity. 

Device-level simulation sessions triggered at critical time spots supplement 
this information with service quality parameters when cell loss was not negligi- 
ble. At the end of the simulation shown in the figure, we will have information 
on revenue and on service quality subject the equivalent bandwidth estimate. 
At the same time device-level simulation results on service quality provide a 
cross-checking of the estimation accuracy and eventually give indications of 
its error. 

In this basic example the gain compared to a full cell-scale simulation only 
comes from omitting, in the device-level simulator, periods other than the 
critical periods. However, as we will see in the examples of Section 5, for 
larger networks the “cut” can be made both in time and in complexity: at 
the device-level we only simulate the network elements “seriously affected” by 
the critical period. This further increases simulation performance, however, 
for the price of introducing device-level simulation inaccuracy due to neglected 
network areas. 

The time scales shown in Figure 2 also show how simulation performance 
can be traded for accuracy by modifying the definition (threshold) of “critical 
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period”, hence changing device-level simulation frequency. In case No. 2 an 
additional point in time was considered critical compared to case No.l. This 
obviously increases confidence in results concerning service quality but it also 
increases the number of device-level simulation sessions hence decreases over- 
all simulation speed. By further adjusting device-level session frequency, the 
precision can be set freely ranging from pure network-level to pure device-level 
simulations. 

Note, that though it would be possible to actually use device-level simula- 
tion results in the network-level simulator, in our system this is not the case. 
While device-level simulation sessions are triggered and configured by the net- 
work simulator, their result is only fed back to the network simulator to make 
up a compact representation of the simulation results but are not used to 
automatically modify the estimation algorithm: that is still up to the human. 
The two simulators can therefore run independently; if for example they share 
the same CPU, the network-level simulator might but does not have to wait 
for the device-level simulation session. In our experimental implementation, 
unlike in Figure 2, several instances of the device-level simulation may run in 
parallel which allows the system make use of a network of computers. 



3.3 Limitations and drawbacks 

Despite its fiexibility, the proposed concept has some inherent limitations. 
In its current form, it does not eliminate the need for equivalent bandwidth 
estimation in the network-level simulator. The hierarchical structure requires 
that each network element and each traffic source be assigned a model at both 
levels. Furthermore, as traffic is primarily simulated on call scale, the study 
of connection-less traffic is excluded. Due to the load-dependent simulation 
sessions, overall simulation time will depend on network load rather than on 
the amount of traffic. 

The concept can be extended by letting device-level simulation results mod- 
ify the estimation technique used in the network simulator when matching 
proved to be poor. Though this extension offers some significant advantages 
(primarily the possibility of “learning”), it also brings up new problems, in 
particular the appearance of a closed control loop and the degradation of accu- 
racy due to the two simulations being repeatedly based on each other’s results. 
This is, however, probably the most promising direction to extend the concept. 
Further extension possibilities include the application of dynamic device-level 
simulations and the introduction of a control level that allows for starting the 
simulation without an equivalent bandwidth estimation method and building 
one up gradually while device-level session frequency is decreased. 
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4 THE SIMULATION ENVIRONMENT 



4.1 Model 

The network-level simulator is an extended version of the PLASMA 
ATM simulator, inheriting most of its networking capabilities. For a de- 
tailed description of its functionalities and structure, the reader is referred 
to (Haraszti 1995), here we only list the most important features. 

Traffic in the network simulator is modelled at connection-level where users 
are characterized by calling behaviour and mobility parameters. Users ran- 
domly initiate calls of different applications where each application is assigned 
a traffic description, a set of service quality requirements and a priority level. 
The traffic descriptions might also have some open parameters (e.g. peak rate) 
for which values are taken from a probability distribution at call initiation. 
The traffic descriptions, service quality requirements and priority level are 
used both by the network simulator for the equivalent bandwidth calculations 
and by the device-level simulator to build up the traffic generator and to 
handle traffic in the network. 

At connection setup Call Admission Control is performed for each hop on 
an equivalent bandwidth basis. In each node a destination-based fixed or fixed 
alternate routing decision is taken. 

In accordance with the UMTS (Universal Mobile Telecommunication Sys- 
tem) concept we have studied networks consisting of MSC (Mobile Switching 
Centre), RNC (Radio Network Controller) and BS (Base Station) types of 
nodes. The nodes are interconnected by ATM VCCs, voice traffic is carried in 
AAL2 connections according to the recent ITU recommendation (AAL2 1997). 
Switching is performed in MSCs and RNCs only, BSs originate and terminate 
connections. Though the simulation environment includes the modelling of the 
air interface, radio characteristics and mobility, these are outside the scope of 
this paper and can simply be considered as a modulation of user location and 
call behaviour and as a shaping of traffic arriving at the BS. 

In the device-level simulator traffic is modelled at packet level. The 
basic unit of user information is the air frame that corresponds to the amount 
of data transmitted to/from the mobile terminal in one burst. Traffic sources 
generate air frames with stationary interarrival time and size distributions 
defined by the application-specific traffic descriptions. The mechanism is iden- 
tical for upstream and downstream but data connections are not necessarily 
symmetrical or bidirectional. 

Air frames belonging to one connection are transmitted between a BS and 
a MSC through a number of hops in dedicated AAL2 or ATM connections for 
voice and data sources, respectively. The establishment and release of these 
connections is not modelled in the device-level simulator. As defined in the 
recommendation, air frames from voice sources are packed for transmission in 
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AAL2 packets that in turn are packed into ATM cells as shown in (Eneroth 
1997). 

All AAL2 connections on a link are multiplexed in a single VCC and this 
“composite” voice trunk shares the link with other VCCs carrying traffic from 
one data source each. In order to allow mobility-related data processing and to 
support handovers, connections are demultiplexed and re-multiplexed in the 
RNCs. The multiplexing of AAL2 packets into ATM cells and of ATM cells 
onto the link is modelled as single-server finite-buffer queues where service 
time is dependent on ATM cell rate and on link capacity at the two levels, 
respectively. 



4.2 Network to device-level mapping 

To initiate a device-level simulation session first the area defined as “critical” 
needs to be determined, then both the configuration and the actual traffic 
situation must be mapped on the device-level model. In the experimental 
implementation the critical area can be defined with the granularity of one 
node, that is, each node of the network is assigned a model on device-level 
which is either entirely included in a session or is omitted. Hence to investi- 
gate an overloaded link, apart from the link itself the node generating traffic 
in the overloaded direction must be simulated at device level. With this limi- 
tation, once the selection of critical area is made, the configuration mapping 
is straightforward. 

As statistical resource allocation in the ATM/AAL2 system is performed 
at both multiplexing levels, an equivalent bandwidth estimate is necessarily 
maintained at each AAL2 or ATM multiplexer. Correspondingly, thresholds of 
“critical load” can be specified for each level. We refer to these as the critical 
thresholds. A device-level simulation session is triggered for a multiplexer if 
the total equivalent bandwidth of its connections related to the multiplexer’s 
output rate reaches a critical threshold. 

Traffic is mapped on the device-level model per-connection: for each con- 
nection that contributes to the traffic in the critical area a traffic generator is 
built up using the application-specific traffic description and the connection- 
specific parameters. To take into account the shaping effect of previously 
passed s withes, the ingress ports of the overloaded switches are also mod- 
elled. 



4.3 Simulation architecture 

The hierarchical architecture of the simulation environment is illustrated in 
Figure 3. The figure also shows that the network-level simulator is actually 
built up of two interconnected simulators: a mobility and air interface sim- 
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Figure 3 The simulation architecture. 



ulator and the network simulator. As the modelling of mobility and radio 
characteristics are outside the scope of this paper, however, the former can 
simply be considered as a modulation of user behaviour and we can focus our 
attention on the latter. 

The simulators are built in an object-oriented way around discrete event- 
driven kernels in the PLASMA simulation and management environment. 
(Haraszti 1995) The simulators communicate through a CORBA interface 
allowing the network simulator to “see” a device-level simulator as if it was 
one of its internal simulation objects. This not only makes communication 
simple but directly allows for multiple instances of device-level simulators 
making use of a network of computers. 



4.4 Validation 

The hierarchical structure of the proposed simulation environment implies 
that validation must be performed for both the upper and the lower level 
simulators and for the hybrid system. In this section we present one example 
from a set of test cases that we have used for validation purposes. Since the 
hybrid system is expected to approximate the results of the pure device-level 
simulation, its validation has been based on the comparison with device-level 
results and will be discussed together with the numerical examples in Sec- 
tion 5. The difficulty of validating the numerical results obtained with the hy- 
brid simulator stems from the fact that the studied ATM/AAL2 technology is 
currently being standardized, and prototype systems are under construction. 
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Figure 4 Simulated and calculated maximum number of connections in the 
N • D^^^/D/1 queueing model. 



In order to validate the network level part of the simulator we have consid- 
ered a number of cases detailed in (Lin 1978) and (Magi 1996). In (Magi 1996) 
a model for multirate circuit switched loss networks with non-zero call pro- 
cessing time is developed, which allowed us to compare simulation and ana- 
lytical/approximative results in non-trivial cases. 

To validate the device level part of the simulator, we have used two test 
cases where comparison with analytical/approximative techniques is feasible. 
First, we consider a single queue-single server system with batch arrivals as in 
(Bisdikian 1996). The queueing system was chosen because it plays 

an important role in the modelling of any system which carries compressed 
variable bit rate voice samples over ATM, most notably in the modelling of 
GSM/UMTS systems with AAL2 transport, see e.g. (Valko 1997). 

Figure 4 is an example out of the series of test cases where we have com- 
pared analytical and simulation results on this queueing system. Specifically, 
we use the N • /D /I version of this queueing system to study the maxi- 
mum number of allowed AAL2 channels over an ATM VCC with a given QoS 
constraint. This queueing system operates as follows. Independent, identically 
distributed batches of random size x arrive at the queue from N independent 
sources at discrete, deterministic time intervals. If all N batches arrive at the 
same point in time, we refer to the system as one with a single offset. The 
batches are then queued and served in a FIFO manner. If on the other hand 
groups of batches arrive at different points in time, we refer to the system 
as one with multiple offsets. For instance, if the N independent sources are 
grouped into four groups, we say that the system is a four-offset system. This 
is the case in a cellular network, where all mobiles belonging to a base station 
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are synchronized such that there is a deterministic time offset between groups 
of mobiles, each communicating with the base station at periodic, discrete 
time intervals. Here, x corresponds to the number of code bits (and the size of 
the AAL2 packet) generated by the voice coder in the mobile terminal. Since 
the QoS constraints, such as the delay of the air frames have to be fulfilled 
even for the last arriving packet in any batch, it is clear that the multiple off- 
set synchronization method allows for more connections on a link with fixed 
capacity, as it is shown in Figure 4. In the Figure we note that the simulator 
results are acceptable. 

Building on these ideas and results from N • D^^^D/l type queueing net- 
works, we have compared analytical and simulation results in some multi- node 
cases as well, and have found that the device level simulator performs well. 



5 SIMULATION EXAMPLES AND RESULTS 



5.1 Single- link example 

In this example voice and data connections are established and released on 
a link of capacity C — l.bMhps. 50 voice and 20 data sources initiate calls 
according to Poissonian arrival processes with parameters = 0.002 and 
Ad = 0.001, respectively and maintain the connections for exponentially dis- 
tributed times with parameters rriy = 500sec and rrid — lOOOsec, respec- 
tively. Active voice sources generate packets with a constant inter-arrival time 
T = 10ms where the packet size is determined by an embedded state ma- 
chine of four states such that the mean rate is ^khps and the peak rate is 
2{)khps. The measurement-based four-state model is extensively described in 
(Valko 1997). Active data sources are of on-off behaviour with exponentially 
distributed “on” and “off” period lengths with parameters OLonl^off — 0-23 
and rate r = 64kbps in the “on” state. Traffic sources are all independent. 
Both applications tolerate a maximum packet loss probability of 10“^. 

Active voice sources are assigned an AAL2 connection each and are all sta- 
tistically multiplexed in a single ATM VCC. This VCC is furthermore statisti- 
cally multiplexed with and is prioritized over the VCCs assigned to the active 
data sources, one each. Such scenarios are expected in ATM based cellular 
networks, see e.g. (Eneroth 1997). Figure 5 shows the equivalent bandwidth 
estimation maintained by the network-level simulator during a 1000 minute 
simulation. Arrows show the points in simulation time where a critical thresh- 
old was reached and a device-level session was triggered. The labels attached 
to these points show the results obtained by the sessions. 

Looking at the device-level simulation triggered at the 90% threshold we 
observe that the equivalent bandwidth was underestimated since at the traffic 
peaks service quality was poorer than required. However, from these results 
only we are unable to accurately estimate the per-connection service quality 



556 





Figure 5 Simulation results for the single-link case. 




Figure 6 Performance penalty of increased accuracy in the single-link case. 



since device-level results are available for the highest traffic peaks only. By low- 
ering the critical threshold first to 70% then to 50% we trigger more frequent 
device level sessions. Prom the figure we see that this gives more accurate 
information on QoS parameters. By further lowering the critical threshold, 
the pure device level simulation can be approached. 

Figure 6 shows the performance penalty of the increased triggering fre- 
quency. On the horizontal axis the critical threshold varies from 100% (pure 
network level simulation) to 50% (practically a pure device level simulation). 
The curves show the accuracy of the equivalent bandwidth and the required 
run time respectively. The accuracy is expressed in terms of the number of 
occurrences when the QoS constraints are violated, which corresponds to the 
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Offered load: 927 - 241 1 kbps Offered load: 301 - 783 kbps 
Figure 7 Network example - configuration. 



“extra” information gained from the hierarchical system compared to the pure 
network level simulator. We observe that approaching the pure device level 
simulation, the run time drastically increases while the accuracy saturates 
which justifies using the hybrid approach. 



5.2 Network example 

As a network example we have used the configuration shown in Figure 7. This 
corresponds to two base station sub-systems each consisting of two BSs and 
one RNC connected in a ring. The two sub-systems are connected to an MSC 
node. Mobile users generate voice and data traffic with traffic parameters 
and QoS requirements as in the previous example. In the first sub-system 
(left side) the offered traffic is significantly higher than the engineered traffic 
which results in an overload on the corresponding RNC-MSC link causing 
high call blocking probability. 

In this example we are primarily interested in the decrease of call block- 
ing probability if we apply load sharing to make use of a direct RNC-RNC 
connection. In Figure 8 this improvement is shown while the total offered 
traffic is varied on the horizontal axis. In accordance with expectations the 
blocking could be decreased by applying load sharing between the two RNCs. 
However, these changes also affect QoS parameters that are not shown by a 
pure network-level simulation while the cell-level simulation of this network 
is infeasible. 

By exploiting hierarchical simulation we can monitor the packet level QoS 
without an unacceptable simulation time. In Figure 9 and Figure 10 the total 
time of QoS-violation out of a 500-minute simulation is shown for the origi- 
nal configuration and for the one with load sharing on the RNC-RNC link, 
respectively. We can observe that without load sharing QoS is often violated 
on the overloaded RNC-MSC link and never on the other one. We note on 
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Figure 8 Blocking probabilities with and without load sharing. 




2.5 3 3.5 4 4.5 5 5.5 6 6.5 

Total offered load [Mbit/s] 



Figure 9 QoS violation without load sharing [min]. 



Figure 10 that applying load sharing results in similar service quality in the 
two subsystems. 

For these simulations the critical threshold was set to 95% for each mul- 
tiplexer and the network to device-level mapping was configured such that 
only an overloaded multiplexer and its outgoing link was simulated in each 
device-level simulation. With this setting a 500-minute simulation took 300 
to 700 minutes depending on total offered load while a complete de- ice-level 
simulation would take approximately 120 minutes per simulated minute. 
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Figure 10 QoS violation when load sharing is applied [min]. 



6 CONCLUSIONS 

In this paper we described a simulation environment optimized for the per- 
formance analysis of wideband cellular systems. Due to the large complexity 
of these systems we saw the need to develop a new hierarchical simulation 
concept which combines the advantages of fast network-scale simulators and 
detailed device-scale simulators. 

Using the proposed concept a fast network level simulation can be per- 
formed while conformance to low level quality requirements is also monitored. 
A sub-set of a large network simulated on call-scale can be “magnified” and 
simulated on device-scale if increased traffic or a failure situation deserves a 
closer look. Last but not least, a simulation study can be placed at an arbi- 
trary point of the “simulation speed versus model precision and confidence” 
trade-off depending on the requirements of the actual analysis. 

These features are made possible by the following main characteristics: 



• the simulation environment is built up in a hierarchical structure where 
the lower level simulator is a generic device-level simulator in which the 
simulated configuration can dynamically be updated (triggered from the 
upper level simulator), 

• the upper level simulator performs estimates on low-level parameters and 
uses the lower level simulator to check these estimates, 

• low-level characteristics such as bit error rate and cell transfer delay are 
not the main output of the simulator: we can specify bounds on these and 
are only interested in the probability of these bounds being violated while 
our main interest is on network-level behaviour. 
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